Summary
The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal business users accessing an environment controlled by the business. In this episode Ian Schweer shares his experiences at Riot Games supporting player-focused features such as machine learning models and recommeder systems that are deployed as part of the game binary. He explains the constraints that he and his team are faced with and the various challenges that they have overcome to build useful data products on top of a legacy platform where they don’t control the end-to-end systems.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos.
- The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with an automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your database/data warehouse/data lakehouse/whatever you’re using and let them do the rest. Go to dataengineeringpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan.
- Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to dataengineeringpodcast.com/hevodata and sign up for a free 14-day trial that also comes with 24×7 support.
- Your host is Tobias Macey and today I’m interviewing Ian Schweer about building the data systems that power League of Legends
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what League of Legends is and the role that data plays in the experience?
- What are the characteristics of the data that you are working with? (e.g. volume/variety/velocity, structured vs. unstructured, real-time vs. batch, etc.)
- What are the biggest data-related challenges that you face (technically or organizationally)?
- Multiplayer games are very sensitive to latency. How does that influence your approach to instrumentation/data collection in the end-user experience?
- Can you describe the current architecture of your data platform?
- What are the notable evolutions that it has gone through over the life of the game/product?
- What are the capabilities that you are optimizing for in your platform architecture?
- Given the longevity of the League of Legends product, what are the practices and design elements that you rely on to help onboard new team members?
- What are the seams that you intentionally build in to allow for evolution of components and use cases?
- What are the most interesting, innovative, or unexpected ways that you have seen data and its derivatives used by Riot Games or your players?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on the data stack for League of Legends?
- What are the most interesting or informative mistakes that you have made (personally or as a team)?
- What do you have planned for the future of the data stack at Riot Games?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
- Riot Games
- League of Legends
- Team Fight Tactics
- Wild Rift
- DoorDash
- Decision Science
- Kafka
- Alation
- Airflow
- Spark
- Monte Carlo
- libtorch
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Hevo: ![Hevo](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/4VC62YUo.png) Are you sick of repetitive, time-consuming ELT work? Step off the hamster wheel and opt for an automated data pipeline like Hevo. Hevo is a reliable and intuitive data pipeline platform that enables near real-time data movement from 150+ disparate sources to the destination of your choice. Hevo lets you set up pipelines in minutes, and its fault-tolerant architecture ensures no fire-fighting on your end. The pipelines are purpose-built to be ‘set and forget,’ ensuring zero coding or maintenance to keep data flowing 24×7. All it takes is 3 steps for your pipeline to be up and running. Moreover, transparent pricing and 24×7 live tech support ensure 24×7 peace of mind for you. Don’t waste another minute on unreliable data pipelines or painstaking manual maintenance. Sprint your way towards near real-time data integration with a pipeline that is easy to set up and even easier to control. Head over to [dataengineeringpodcast.com/hevo](https://www.dataengineeringpodcast.com/hevodata) and sign up for a free 14-day trial that also comes with 24×7 support.
- Select Star: ![Select Star](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/65NZFtJd.png) So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. From analyzing your metadata, query logs, and dashboard activities, Select Star will automatically document your datasets. For every table in Select Star, you can find out where the data originated from, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth in data is built in minutes, even across thousands of datasets. Try it out for free at [dataengineeringpodcast.com/selectstar](https://www.dataengineeringpodcast.com/selectstar) If you’re a data engineering podcast subscriber, we’ll double the length of your free trial and send you a swag package when you continue on a paid plan.
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Atlin is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlin's active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans could focus on delivering real value. Go to data engineering podcast.com / atlan today, that's a t l a n, to learn more about how Atlan's active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork, and Unilever achieve extraordinary things with metadata.
When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With their new managed database service, you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs. Go to data engineering podcast.com /linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy, and today I'm interviewing Ian Schwier about building the data systems that power League of Legends. So, Ian, can you start by introducing yourself?
[00:01:36] Unknown:
Yeah. So you got my name right. Thanks for that. I'm an engineer over at Riot. I work on a team called the Data Central. We are kind of the team responsible for building and maintaining any, like, data related products or anything like that for any video game that uses the League of Legends game engine. So think like Unity or Unreal. We kinda have our own custom made game engine. We got a couple games that run on there, including, like you've mentioned, League of Legends and Teamfight Tactics and Wild Rift. So we've got a couple sweet games on that. And, yeah, we maintain all the data and ML stuff for it. And do you remember how you first got started working in data? I guess, like, my story is probably a little less interesting than some folks. I went to college for CS.
I started working at Adobe kind of as an an internship and then graduated there on a video streaming team, and I was more of, like, a consultant in that role. Some fun things happened. Corporate stuff happened. I had to go find a new team. So I actually ended up on a data engineering team, and that was, like, right around 2017. I guess I just gained, like, a lot of SME and, like, I had a lot of passion for the space. 1 of my, I guess, subjects in school were, like, distributed systems and machine learning. So, like, the whole space of big data kinda clicked, made sense there. Then I spent some time at DoorDash.
I think you might have had some DoorDash folks here on the show before. I think you had Sudhir here. Yep. I I actually got to work with Sudhir for a while. He's amazing. So I worked there, was 1 of the earlier engineers on the data platform team, and now I'm over here at Riot. So I've been doing it for, like, I guess, 8 years. Very cool.
[00:03:26] Unknown:
And so in terms of what you're doing now at Riot Games, for anybody who's not familiar with League of Legends, I'm wondering if you can just give a bit of an overview about the game itself and the role that data plays in the overall player experience.
[00:03:41] Unknown:
League of Legends is a 5v5, like, MOBA game. Multiplayer online battle arena, I think is what it stands for. It was made by our founders when they were back in college. If you're actually really super interested on that, there's, like, a documentary on Netflix. It's actually pretty decent at covering the whole story better than I could do. So please feel free to watch that. But the video game itself is 10 people load into a game, and then it's 5 people per side. Each player kinda picks their own champion depending on, like, you know, what they want the characters to look like or how they want to play or whatever. They each take kind of a role, which is to say they either play, like, top lane, mid lane, or bot lane, which are specific kind of divisions of the map that they are kind of in control of, or they'll play more supporting roles or like a jungle, which is kind of a free for 1 kind of character.
And the whole point is to kind of take out the enemy Nexus, which is kind of guarded by turrets. So through, like, a collection of team fights and kind of, like, strategy, you slowly make your way to the opponent's base to take over their turret. And it's a pretty old game. I think the first patch came out in, like, 2008, I think it was. So it's been around for a long time, and it's become 1 of the more, like, the most popular, like, online video games kind of throughout the world. We have something around, like, 100 of millions of players playing, including, like, a kind of competitive esports system as well. That ride is really kinda hedging there.
[00:05:20] Unknown:
So in terms of the data that you're actually working with, curious if you can give a bit of a flavor as to the characteristics of it. So thinking in terms of 3 v's of volume, variety, velocity, whether it's largely structured or unstructured, if you're dealing with real time data that you're some of the ways that that data gets consumed and repurposed?
[00:05:52] Unknown:
We definitely look a bit more like kind of a traditional big data stack in that sense. So, like, we have the Hadoops, we have the AWS glues to power everything, yada yada yada. But I guess to, like, actually talk about what we've got going on, we kind of collect data in a couple different forms. We collect data from all the people playing it, so all of our players. So we collect, like, their hardware information. We, like, what were they doing inside of the game? We kind of do some meta analysis to understand, like, how well do we think they're playing, What kind of rank would we put them on? Who should they be kind of playing against?
Team matching kind of stuff is really important there. We collect data from the game servers, which are, like, ostensibly the heart of League of Legends since the game is this kind of server assortative architecture, meaning that the server is kind of determining what moves are, like, correct and which ones to kind of ignore. We collect all of the kind of game state data from that. So, you know, a game lasts 30 to 45 minutes. So all the encounters, all the locations, everything like that, we end up collecting to do whatever we need to do with. And then it being kind of an online game, we have a suite of microservices that do various different things from, like, allowing players to manage their inventories or, like, whatever champions they bought or anything like that.
So we end up collecting all of that kind of data, so it's a pretty large I guess I would probably say it's a pretty decent variety just because, like, its logical division is pretty interesting. We tend to see most of our data be in this kind of, like, unstructured JSON uniform, and we do a lot of work at the data warehouse and to kind of put schemas on top of that to understand it. So I guess kind of the consequence of that is most of our analyses end up being some kind of batch. We have, you know, some kind of real time e, you know, like, use cases, but it's kinda interesting. A lot of it is definitively like a mini batch kinda deal where it's like, we don't really wanna take any action for some kind of machine learning models until after a game is done playing, for example.
So I wouldn't really call that real time. I would say that that's kind of just after the game, and that's where a lot of our actions actually kind of end up living. So we end up being in this kind of really time sensitive batch land. And the way we kinda use the data, it gets end up breaking into, like, 2 big areas. It lands in, like, decision science, which I think a lot of people listening are probably familiar with this. These are, like, your dashboards, your, like, revenue strategies, and everything like that, kind of the typical things you see out of data teams. And then recently, like, the past couple of years, we've been really working more on, like, the machine learning and and, like, the MLOps space.
We do a lot of work in like the player behavior space because we wanna make sure like people are playing fairly. So we, you know, try to determine if they were playing intentionally bad or, like, if they were just having an off day or whatever, and we try to use that to kind of inform models to say, like, hey, this player needs to be banned or something or, like, this player is maybe should go on, like, a 10 day warning or something like that. We have a lot of, like, those kind of behavioral systems going on.
We use a lot of that data to inform our matchmaking. We kinda have a really challenging problem there is, like, we're trying to model, like, a really latent property of our players. You know? We're trying to, like, measure what their skill is. It's not just like something like a linear regression can just map with 2 or 3 features. Like, we actually need a lot of other models to kind of understand, like, how credible they are, what their normal kind of variances across champions, what the meta is saying about what kind of champions are good right now. So we do a lot of that kind of activation to try to, like, give better games.
And recently, we've been trying to kind of get further into the game and actually give them more, like, recommendations. So, like, you know, just like the the game can be certainly really confusing. If you're a new player, there's a lot of systems and a lot of, like, things that you have to interact with to play well. So, you know, we try to give you recommendations around, like, oh, what items do we think you should buy is like a is the currently launched example. So if you, you know, go into game and you open up the item shop, we'll kinda tell you, like, hey. Based on your opponents, based on who you are and what you're playing, here are some items that would be really good for you or your items that are good against these opposing champions. And that is this kind of really weird I'm sure we could talk about it whenever, but this is this really interesting space of, like, embedding machine learning into, like, binaries, which is a really weird space because most of the time I see it in services or, like, in the data warehouse itself.
[00:11:02] Unknown:
That's definitely very interesting. A number of points that I think we'll dig into there, and to your point also of trying to infer a quantifiable aspect of skill and the fact that it's not just a simple linear regression, it's also not a static value either because as people either spend more time on the game or if they step away for a long time and the game evolves without them and then they come in with a particular play style that is no longer dominant or no longer fits into the overall play ecosystem that also introduces a number of added variables to that equation. And so in terms of being able to capture useful information to figure out things like that, be able to manage those recommendation systems.
And then also, I'm sure there are aspects of data collection and analysis that feed into just the organizational and business elements of running the platform. I'm wondering what are some of the biggest challenges that you're facing either technically or organizationally that are data focused?
[00:12:04] Unknown:
Yeah. I think a lot of our challenges are probably more in, like, the technical space more than the organizational space, but they certainly exist. A lot of it comes down to the fact that, like, League of Legends is just a really old game. Like, it's gone through a lot of different changes and migrations such that, like, even the way we just ingest data from, like, the game server has kinda changed hands multiple times, has changed systems multiple times. You know, for example, at 1 point, we were using, you know, fully Kafka based ingestion, then we moved to, like, this s 3 batch ingestion.
Maybe we moved back or something. So just in that realm, we have a lot of challenges around, like, kind of reconciling the current state of technology with what the old state of the business was. Because, like, to a lot of players, it doesn't really matter if you're running on Kubernetes or not. You know? But, you know, the way we collect that data certainly does. And it's the way that kind of shifts or, like, the distribution of data kind of shifts underneath us. We have to be able to, like, reconcile all of that at ingestion time or maybe even at, you know, training time.
Ends up having a lot of, like, interesting cases around like, oh, hey. For this patch, ta da, like, you know, half of this champion's data got nulled out because of a feature flag or, like, a botched migration that worked everywhere else, but not in live or something. So we tend to have a lot of challenges just trying to move across that kind of, like, large legacy space, I would say. The other big challenge comes down to kind of the way we operate with, like, our various vendors or publishers. Just recently, we announced that Riot will be taking over publishing in, like, the Southeast Asia areas. But, you know, before then, we had, like, another company involved that would kind of help, you know, operate the game for us and facilitate that exchange of data.
This is also true with Tencent and how we operate over in China. So there's kind of a lot of challenges there in, like, the privacy space that we have to think about. And then, yeah, I guess the final point is just we definitely have a lot of, like, derived features and, like, derived metadata that we have to think about. Just to give, like, an example, we used to have this measurement called credibility, which was a way we could understand if a player generally tells us the truth when they report somebody else and they say, like, hey. They were playing poorly or something. And that feature in and of itself fed into many other machine learning models.
So we have to do, like, a whole lot of legwork to transform any of this data into some kind of, like, usable metric, and it goes through many pipes before it even gets to players. So, like, I guess that complexity just kind of grows as the business changes and as we try to model things like what we understand the meta to be, you know, what we understand good champions to be. So I guess it's all maybe a typical answer, but it's all change management. Like, it it always is change management, and it always is around, like, how old the system is and how the data changes underneath. And that's certainly true for us.
[00:15:18] Unknown:
And in that aspect of being locked in by some of these legacy aspects and legacy platform decisions, I'm curious if there is a particular interface as far as the data collection or manifestation of data products in the game that has created some of the constraints that you have to work around and ways that that is reflected in how you orient the overall architecture of the platform to be able to fit within some of those constraints that were implemented early in the game's history?
[00:15:55] Unknown:
That's a really fun question. It's funny because, like, I think as League got bigger and, like, as we kind of underwent the, there's, like, a very common migration going on right now in companies to go from monolith to microservices. As we underwent our own, we still ended up having this, like, big old monolith application, this big monolith Java application that did kind of all the final routing from, you know, whatever is going on outside into the video game. That piece is still very much alive. So there are particular data flows or like this 1 very explicit kind of JSON object that has to be changed in, like, very particular ways in order for a game to start. So that doesn't bring up any challenges in and of itself for the data team, certainly for all the services teams. But for us, what that means is by the time data gets into the game, there's kind of this really strict rule that, like, data can only be in the game if it was built into the game or if it comes in from this very particular path that is really nebulous, like, really difficult to change and, like, the effects are super nebulous because of our kind of span throughout the world.
So what that means is, like, we can't just ship a service called, like, the item recommender service that we ask the game server to kind of reach out to. Like, the architecture really kind of baked itself in such a way that it was like, hey. No. If you want this item recommendation data, we don't really have a great way to do that kind of inference unless you just bake it into the game. So and I think because games are, like, pretty sensitive to latency, that restriction makes a lot of sense. But as the game kind of grew and as our feature set kind of grew, the wiggle room gets even smaller. Does that kinda make sense? So, like, you know, it becomes more and more difficult to say, like, hey. At the beginning of the game, please reach out to our machine learning service that'll take 2 seconds to compute a feature to give it back to you. Because, like, 2 seconds can make, like, a huge difference for the player experience if they're sitting on a loading screen even longer, if the game hitches because, like, you're loading some bespoke machine learning feature.
So a lot of our architecture ends up having to kind of follow this, like, don't touch the game server if you don't have to. Don't touch the sort of supreme JSON object if you don't have to. And I think I guess, like, it's funny because, you know, when you mentioned that, it's like an onion. Right? There's layers to this in the sense that, like, even security into the game servers is pretty interesting. Like, we don't allow arbitrary TCP connections to the game server. Otherwise, it's like a huge security risk. We have, you know, tons and tons of custom infrastructure in front of the game servers to monitor network activity to, like, cut out any packets that get in the way.
So even just kind of trying to be around the game server and kinda cheat the system a little bit is also super spooky. So, yeah, I think that's definitely 1 of the things that make all of our kind of products really interesting from an engineering perspective because some of these constraints are maybe hilarious, but also serve, like, a really distinct purpose that you don't really wanna, like, try to up and change willy nilly. You really need to, like, understand it. And sometimes that just means going with what is, like, kind of been demonstrated to work. So baking data in the game and just kind of accepting that, you know, inference will always be stale in 2 weeks, so to speak. Or, like, maybe even deciding that, like, this feature is too sensitive to a 2 week staleness, so we're just not going to ship it this way. We're gonna have to kind of invent another way to model this kind of problem.
[00:20:01] Unknown:
Hopefully, that answers your question. Yeah. It definitely does. And 1 of the other aspects that I was going to dig into is some some of what you're discussing around the sensitivity to latency from the end user perspective, both from the avenue of I want to provide this machine learning feature or I want to provide this data input to influence the way that the game experience is going to progress, but also from the perspective of I need to be very careful about where and how and what type of instrumentation I add into the game to be able to pull data out of it as well. Because with the game engine being on a very tight loop to ensure that the user has that interactivity even if it is dealing with potentially stale network data. The the end user needs to have that experience of the game as fluid or else they're just gonna drop out and how that influences the way that you think about being able to instrument and collect data from that end user experience or where that data collection happens, whether it's on the client side or in the server or, you know, just monitoring the TCP traffic and trying to rehydrate that into something meaningful.
[00:21:11] Unknown:
It's funny because, like, I think, like, all those constraints also give, like, a really interesting, like, outcome, which is that, like, the game server kind of knows all. It's this kind of ever knowing entity. Like, it knows everything that came into it to construct the game, all the important bits anyway, and it can record everything that happens in the game and tell you otherwise. So you really just need to be able to talk to the game server, which has kind of led to a pattern. I haven't seen it in too many places. You know, at DoorDash, if you wanted to say, hey. What happened on the entire journey of this delivery from restaurant to merchants to driver or to Dasher?
You had to kind of merge it across various different systems and maintain many of foreign key to do so. In league and I think also I've heard that Fortnite might do something similar. I haven't dug into it. I have a connection over there who's talking to me about it recently. But we basically are in this kind of fortunate place to generate a single artifact from a game, just just like a single parquet file, essentially, of just all the events that happened, when they happened, with all the sort of dimensional data that we care about. I guess the benefit was all of that stuff was in memory at 1 point in 1 system. So, like, it's completely referentially correct. Like, there's not gonna be, like, you know, this player 1 bought this item, and then you never see player 1 again. Like, that's never going to happen.
It was proven that it was all kind of in memory at 1 point unless the game server crashed or something. So that actually leads to this really interesting dataset that gets generated that we are trying to leverage kind of as much as possible. It's the singular data set that's completely correct that we can, like, ingest into the data warehouse and say we know everything about this game once the single file finished ingesting, which is, like, a really powerful feature. We can train machine learning models off of this, which we are doing, and we're trying to say, like, you know, if you ever need raw data for inference or something, the latency that this data is available is actually really quick. So you can just kind of hit that unfurl the file and then do your inference or whatever. And, yeah, it it actually cleans some data quality problems, but introduces other data quality problems.
Kind of a fun 1 is, like, if there's ever, like, some flag in this giant, you know, multimillion c plus plus codebase that is turned on on 1 game server, but not the other however many hundreds. You might have a duplication of data or another row inserted. And it's just kind of this random anomaly that now you have to understand, like, well, what do we wanna do with it? So kind of building our pipelines in such a way that anticipate that is certainly challenging, but the benefit of it of this kind of dataset is really powerful. This is kind of 1 of the only times I've seen this thing kind of happen. It's really interesting to kind of think about how it changes things, if that makes sense. Like, if you go and read a blog post from, like, a Netflix, it's kinda interesting because it's like, oh, they do all this work to munch all this data and collect it. Maybe we don't have to do any of that, and we can just kinda reap the benefits of their multi armed bandit system or whatever. So I think that's definitely, like, a really interesting kind of capability we have, I think.
[00:24:47] Unknown:
The biggest challenge with modern data systems is understanding what data you have, where it is located and who is using it. SelectStar's data discovery platform solves that out of the box with a fully automated catalog that includes lineage from where the data originated all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your DBT, Snowflake, Tableau, Looker or whatever you are using and Select Star will set everything up in just a few hours. Go to dataengineeringpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan.
Another interesting aspect of your situation that I'm curious about is the kind of team topologies and the interfaces that you have from the data engineering and data platform side of things reaching across to the developers of the game and the game engines and some of the feedback loops that you have available for being able to say, I want to be able to create this type of information or the game dev saying, hey. I want to be able to use this data element or this type of recommendation to power this feature in the game and just some of the ways that the organization is structured to be able to facilitate those interactions and the interfaces that you have to be able to make sure that it doesn't just become a free for all of everybody throwing requests everywhere.
[00:26:15] Unknown:
Yeah. Definitely. I think it's the requests everywhere and the data everywhere that's the really crappy part of that, you know, because it's hard to actually have, like, good quality data when it's just so random. I think the best way to answer that is, like, kind of talking about the way League is developed from kind of the game engine perspective. Since we build, like, the League of Legends game engine and lead data central is kind of the team that helps build any, like, data integrations out of the game into the data warehouse or any kind of frameworks or platforms to help you kind of develop your machine learning products. 1 of the things we see is we have this really kind of, I guess, like, embedded model where you see, you know, like, we'll have data scientists kind of working with individual game designers or game design teams, be it on Teamfight Tactics or League of Legends or Wild Rift or whatever.
And we'll try to kind of understand, like, you know, what is it that you're trying to solve? What is it that you're looking into? And then the data scientists are actually kind of powered to do, like, a little bit of POC work. And once they kind of understand, like, if the problem is solvable or whatever, they can come back to us or our kind of overarching pillar or, like, under an organization called Tech Foundations, which tries to, like, provide all the infrastructure for all the game teams. We're kind of then able to say, like, hey. This is like a game engine feature. Let's go work with the game engine team to kind of make it more general for other teams. So that's kind of, like, in from data scientists from specific game designer out to kind of general change within any of the infrastructure tools is something we see a lot.
But we also do see a decent amount of just, I guess, like, emergent work in the sense of, like, lots of folks coming and asking, like, can you help us kind of interpret this dataset? Can you help us kind of understand what's going on here? And for that, we are trying to kind of solve that with some self serving tools. We're really exploring the world of, like, data contracts to see what we can do in the sense of, like, allowing engineering teams or product teams to kind of define what they want their data to be, and that's just kind of help them find it. We find Alation the tool Alation to be a little useful here because it this gives people a way to, like, look over the entire catalog of all the collected data, and we try to keep that somewhat up to date. So if there's, like, some particular change they're looking for, they can kinda just reach out to the owning team. But I definitely think the main avenue is sort of this kind of embedded model of data scientists really understanding the problem space from the game designer perspective, transforming that into, like, a data problem and then working with however many teams across kind of the central tech foundations pillar to really bring that change out.
Since we build a lot of our own tools from, like, you know, the way we build the game to the way we ship the game to the way we just, like, design characters. We have a lot of ability to kind of go in and say, like, hey. We're just going to build this new component within the artist tool to allow them to kind of do that. So then the folks on my team have to be pretty versatile, like, in any given day, we're doing something in c plus plus to something in spark, to something, like, as low level into the game engine as possible. But it's all in service of trying to help data scientists kind of discover ways that they can give data to, like, game designers, which is, I think, a really interesting challenge.
[00:29:59] Unknown:
In terms of the current platform architecture that you've settled on, I'm wondering if you can give an overview of at least the kind of core elements and the aspects of the platform capabilities that you are optimizing for.
[00:30:17] Unknown:
There is, like, a lot of history here. There's this really good blog post that some engineers at Riot wrote. It's like 7 parts or something about, like, how the kind of container infrastructure has evolved and just the way we ship services. And that is actually a really good demonstration of why we've had to do so many changes, I think. But kind of where we settled, at least on, like, the data platform side, is we have an individual kind of collection service at our Edge network that collects all the data from players, and that ends up feeding into we have this fairly real time kind of Kafka based ingestion system that is maintained between us and the central team at Riot, a central data team. So a lot of data comes in through that and then gets kind of Kafka ed into s 3, into our kind of Hive data warehouse.
We have a separate service that we maintain, like, a separate Java service that kind of talks to the game server to ingest that file I had mentioned and kind of parse that out and understand that and push that into the data warehouse also over s 3, although that one's, like, not super kafka e. And then we have an internal collection service, similar to the 1 for that's customer facing, but this one's generally more from microservices. And it kind of goes through a similar path, but it's a little more vetted, a little more secure, so there's some shortcuts and optimizations there. Once those all land in some kind of s 3 buckets, be it through Kafka or do, like, custom services, within just kind of sit on top of our giants, AWS Glue and Hive and since I guess, partnered pretty heavily now with Databricks since we leverage Delta tables a lot to do a lot of our kind of, like, acid changes or, like, our acid constraints that we might want in certain tables. And then from there, we use, you know, airflow to do all of our orchestration of all of our spark jobs.
It all kind of ends up into, you know, maybe like a Tableau or some kind of front end dashboarding tool. Generally, Tableau is what we see. And then for the data activation cases, depending on what we're kinda looking at, sometimes this is like, you know, another team, We'll just dump into their s 3 bucket. We're like a pretty big AWS shop. So lots of tooling exists to kind of share files across different AWS accounts without too much hassle. When we talk about kind of the game, it's kinda interesting because at that point, we're kind of linking into our custom compiler system that we have. We have, like, this really big build farm out in Las Vegas that kind of builds the game under various, like, compiler settings or, like, under various, I guess, like, shader settings, if you will, to, like, ingest kind of the graphical data and stuff like that.
So we'll hook into that on kind of a batch cadence to say, like, here's the new item recommendation data or here's the new, like, detection algorithm to determine, you know, did a player play top lane or did they actually play bottom lane even though they said they were going to play top lane or whatever? That kind of, like, inference determine how a player actually played all of that. It's, like, translated from some spark code into some c plus plus code that let's get kind of compiled into game. And that one's really fun just because, like, I don't know if you've ever done the exercise of, like, trying to take a decision tree and, like, actually codify all of its splits out into if else statements.
But, like, actually seeing it kind of rendered that way is really hilarious. So we have a couple different things that end up going through this translation layer, if you will, from Spark into, like, c plus plus I guess compiled. It's pretty interesting. It works fairly well, but, yeah, I think that's kind of the breadth of it. So it's mostly like a hive shop just with some really interesting kind of data activation cases.
[00:34:20] Unknown:
As far as the delivery step of the ML models in particular, you mentioned that those get baked into that compiled binary. And I'm curious what types of constraints you have around the actual size of the binary for being able to deliver to end users because, obviously, you don't want to bake in, you know, the entire gbt 3 model, for instance, into a binary that somebody has to download and play. You know, you you want it to be able to actually fit on their hard drive.
[00:34:50] Unknown:
Hey. By the way, in order for you to play this character, you have to download the 2 terabytes of all player data so we can give you 1 single number. That'd be fun. No. I think, like, it's really weird. Like, there becomes, like, a lot of, like, philosophical change, I think. Like, I think when you're in the service land and you're reaching out and we have a couple services that are, like, just microservices to detect, like, did you feed in game? And those are really nice because we've left the game server. We've come back into microservice world, and we have, like, kind of your standard, like, network call procedures or something. But, you know, once you get into the game server, it's like, oh man, someone's just gonna kind of call this as like a function and like we can't go and make like separate network requests. We can't like pull anything off of a hard disk or anything like that. We just kinda have to bake all of this into RAM and constants and code that return some enumerated value that the game engine can understand.
So the constraints really just come down to, like, is this even maintainable? It's really quite challenging to actually know, like, I guess, like, a really good example is, like, for our kind of item recommender, it's actually a really challenging problem to know if someone kind of selected an item that we recommended them. Right? Because, like, even though that recommendation happens in the game, we actually don't know what their decision was until after the game ended, and we ingested it, and we parsed it out, and we, like, reinterpreted the meta and everything.
So that kind of delay between, like, inference and what actually happened is already super challenging. And kind of on top of that, you know, we have to ship things that kind of the game engine understands. So there's a lot of format problems that we end up having to run up against. It's pretty funny because we have, you know, these kind of automated recommenders that are generating data that gets kind of baked into binaries. So, like, if we have a bad day or, like, you know, if Databricks just kind of loses a container and half correctly finishes a task and generates, like, invalid JSON data, we break the entire build of the video game.
That's not very good for a video game company with lots of video game developers who need to build the video game. If you build the break the build tool, everyone kinda gets frustrated. We have, like, a lot of really interesting lines to walk depending on where we're kind of shipping the service, if it actually ends up in game, if we need developers to be able to call a function and return it, or if they're instead of kind of using this data system that's been built that's maybe a little more flexible, that feels like a bit from, like, a game developer perspective, it feels a bit more like an RPC call as opposed to, like, a function call. It's really funny because it should be fairly easy, but once you kind of start to, like, think that you only have, you know, like 60 picoseconds or something to return a function value, all of this becomes really more complex. So the models we end up shipping tends to be, you know, like, as pickled as possible, as compressed as possible, or as reduced as possible such that if we do have to send layers of a neural network or something, for example, we only send the minimum amount of weights possible or some kind of minimum amount of data that we can reconstitute the model in game in c plus plus with as little loss as possible.
And, like, that whole iteration loop is really challenging because we typically need to understand, like, I guess it's kind of similar to the Android problem in the sense that, you know, someone could still be running a really old version of Android on a really old phone. You know, you could play League of Legends on a toaster if you try it hard enough. So being able to, like, understand, can we actually take the time to reconstitute this model on everyone's machine, or do we have to bake in kind of overrides if we detect that, like, hey. Your machine is just not gonna be able to run this in a reasonable amount of time. It's kind of a smart default. I know I'm kind of dodging the question, but it's mostly because we have, like, a couple different models sitting in game, and they all have different avenues and different constraints that all are kind of suppressed by this latency that we're having to play against and kind of the variety of computers that can play League of Legends. Computers and or toasters, I suppose. Yeah.
[00:39:23] Unknown:
And that aspect of being able to dynamically toggle whether or not you're actually going to use a particular model or enable a particular feature based on the resource capacity of the machine that's running the game and running the end user experience, I'm curious how that factors into also the elements of quality and consistency of the data that you're able to collect based on those resource constraints also and some of the ways that you factor that into the platform and the transformation designs to be able to manage that lack of consistency, both because people aren't necessarily all going to be running the exact same version of the game because they haven't bothered to download it yet or because their machine isn't able to actually run the instrumentation that is going to give you that additional feature or piece of information from their experience.
[00:40:18] Unknown:
It's interesting because I feel like we're gonna be kind of solving that problem ad nauseam. Like, it's never really gonna go away. So kind of the things we found that kind of best help us kind of mitigate that is really trying to lean into, like, I guess, kind of the ML ops model or rather the DevOps model and really kind of trying to lean into the idea that, like, data scientists, if we're shipping something, let's ship, like, a package. You know, let's ship something that's version controlled with unit tests, with as much faked data as we can, and try to like, a significant amount of effort goes into, like, alerting and just trying to do, like, I wouldn't necessarily call it anomaly detection, but just trying to, like, capture the cases where, like, you know, hey. This game only has 80% of its data reported. I don't really need it. I'm just gonna get rid of it, but I'm gonna, you know, tick a counter.
And if this counter gets too high, I'm going to page an engineer to see if they can come in and do something about all this missing data. So we really try to leverage, you know, like, this kind of modular development to, like, hopefully have unit tests and integration tests to try to catch these things and test these things. We work pretty closely with our kind of game analysis team to say, like, hey, you know, we've made this kind of nebulous change or, like, you know, it works on my machine. It works on all the build machines, but, like, can you go through and try to understand, like, how is that going to, you know, work with, you know, random users and PC bangs? Or, like, how is this going to affect, you know, people who play this particular champion because, you know, we have a 180 champions. We didn't playtest all a 185 of them. Can you help us understand that? So, you know, Riot has a pretty good QA team and a pretty good QA system to try to try to mock that stuff out. So we leverage that as much as we can.
And then I think, like, a big part of it is we're not afraid to be a little more, like, experimental with the data that we collected. But, you know, since we operate in, like, a bunch of different shards and a bunch of different regions and they all get deployed separately, we have a little more flexibility on saying, like, hey, try this thing out in this region for, like, a month. Collect as much data as possible, wait for these kind of events to happen, and see what our kind of flux is, and then kind of address it in our pipelines accordingly. I guess I guess the best way to answer it is, like, our rollout for models tends to be pretty long as opposed to, like, other shops where sometimes, like, shipping a model was just, you know, uploading weights to an s 3 bucket, and all of a sudden you we had the model shipped.
We tend to have a more kind of, I guess, structured rollout plan. We tend to, like, monitor metrics very closely. We tend to work with, like, change advisory boards really closely. I'm not sure if that really kinda answers the question, but I guess the answer is we we just try to do a lot of DevOps practices and we try to roll out pretty slowly to gain a lot more confidence until we're, you know, at that 95% mark and we say, okay. Now any variance of the data will just kind of treat as an emergent issue or treat as an on call issue and kind of update accordingly. But it certainly happens as much as possible. We've just found that trying to get in front of it in, like, the development process saves us the most headache.
[00:43:49] Unknown:
Data engineers don't enjoy writing, maintaining, and modifying ETL pipelines all day every day, especially once they realize that 90% of all major data sources like Google Analytics, Salesforce, AdWords, Facebook, and spreadsheets are already available as plug and play connectors with reliable intuitive SaaS solutions. Hivo Data is a highly reliable and intuitive data pipeline platform used by data engineers from over 40 countries to set up and run low latency Boasting more than a 150 out of the box connectors that can be set up in minutes, HEVO also allows you to monitor and control your pipelines. You get real time data flow visibility with fail safe mechanisms and alerts if anything breaks, preload transformations and auto schema mapping precisely control how data lands in your destination, models and workflows to transform data for analytics, and reverse ETL capability to move the transformed data back to your business software to inspire timely action.
All of this plus its transparent pricing and 247 live support makes it consistently voted by users as the leader in the data pipeline category on review platforms like g 2. Go to dataengineeringpodcast.com/hevodata today and sign up for a free 14 day trial that also comes with 247 support. Another interesting element of how to think about the design and operation of data platforms and data systems is particularly given the longevity of the product that you're working on, how to reduce the overhead of the onboarding experience so that as new teammates come onto the team or as people maybe need to work cross functionally where somebody working on the game engine just needs to be able to self serve some aspect of data.
How do you think about the design and user experience of those platform components to be able to reduce the level of effort required to manage that onboarding and being able to come up to speed and become effective?
[00:45:50] Unknown:
That's actually, like, a really big challenge that our kind of data engineering team is thinking about just because, like, you know, their on call load can be quite high, you know, depending on the time of year. Like, if world is coming up or something and there's a lot of, you know, analysts who are maybe a bit more curious about something at that point in time, their on call low can be pretty high because there is a lot of question of, like, where is this data? Can I trust it? Etcetera, etcetera. And given that the team is pretty small, we tend to leverage, like, vendor tools to help us as much as we can here.
We're spiking out use of Monte Carlo to try to see if we can, like, reduce the amount of times people come in and say, like, hey. This data looks weird for this 1 hour, which is the 1 hour I care about of all hours. I think I mentioned before, we try to heavily use Alation to make sure that we have, like, an up to date catalog that we try to funnel people to as much as we can. But I think just like the challenge of onboarding at League is 1 that exists kind of everywhere. You know, we have, like, this internal Wiki for if you're a game engineer and you're kind of getting caught up, and it says, like, in big bold headlining text, like, it will take you anywhere from 6 to 8 months to be productive.
They're just not trying to sugarcoat the fact that, like, this is, you know, many years worth of work and there's probably no way to get caught up quickly. So I think, you know, from the data perspective, we generally tried to lean as much into the realm of, like, you know, use our UDFs, use our kind of defined libraries, use our tools, because they'll help you, you know, not fall for, like, common pitfalls or save you the time. I guess, like, a decent example is, like, you know, a new analyst might come in and naively try to query all the game summary data for, like, 1 particular shard.
But shards go by many names. So, like, you might not even know you need to join against some other dimensional table. So we try to provide libraries or tools that, you know, are kind of installed by your Databricks notebook as default, and we try to lean you into using those as much as possible until you kind of feel comfortable enough to start writing raw SQL against the Hive Tables. It's actually definitely really interesting problem that we're still trying to kind of try to nail down. I think because there's kind of this accepted bar of difficulty across all the different League of Legends teams, you see a lot of different individual onboarding projects kind of surface.
I was part of 1 that kind of helps try to say, like, hey. If you're a new services engineer, you come to this class and we teach you all the services infrastructure in, like, the course of a week and a half. There's, like, a game boot camp that we hold. So, like, new people coming into game engineering or folks transitioning into game engineering from another team. They might spend, like, 3 or 4 weeks just going over individual components of the game engine and how it ships. So, hopefully, you know, if you ask me again in 2 or 3 years, we have a more structured answer. But I think right now, we're really trying to solve it with, like, vendor tooling and libraries to try to, like, reduce the friction of getting into the ecosystem and just start playing around.
[00:49:18] Unknown:
In your experience of working on the League data team, what are the most interesting or innovative or unexpected ways that you've seen data and derivatives of the raw data used either by Riot Games as an organization or by the players?
[00:49:34] Unknown:
Yeah. I definitely think for me, it's this kind of, like, you know, translating decision trees or translating neural networks or whatever into, like, c plus plus code. Like, it is just such a such an interesting experience to kind of have to leave, you know, model training world, which, you know, everyone now thinks of in terms of Python libraries or, you know, like Jupyter Notebooks or whatever, kind of having to leave that and then come into the world of, like, no. This is a c function that allocates a pointer that sticks a weight into it, but then multiplies it out. Like, that is just such a trip, I guess. Like and it's, it is 1 of the most powerful things we have because we can really deliver, like, customized experiences to all of our players.
But it's also just 1 of the things that I know, you know, if I don't go to a game company after this, it's just gonna, like, rot in my brain immediately. Like, I'm never gonna need to think about how to serialize a decision tree ever again into c plus plus Maybe I will, but I have my doubts. So I think there's that. And I also think just the way we are kind of fortunate to have all of the data funneled through your game server. It allows us to save a lot of grunt work, making sure that our data is consistent with itself. We can just kind of trust the game server to do what it does and then just put checks in place to make sure that nothing crazy happened or no, like, crashes happened along the way.
So that's really kinda nice. I remember, you know, at Adobe, we spent a long time just trying to build systems to join various datasets. And having all of that kind of go away right up front is really interesting.
[00:51:18] Unknown:
In your experience of working on the team and helping to implement some of those technological capabilities. What are the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:51:30] Unknown:
I think for me, the biggest thing has been, like, very little of being, like, a machine learning engineer, if you will, is statistics. Like, very, very little. A lot of it is understanding your model. Like, we've had outages with models that actually took 60 days to manifest into any kind of problem. But then the problem it manifested into was, you know, like, 100 of players being banned for no apparent reason. And it's like, well, why did that happen? But, like, having to trace the data through the data pipelines and, like, understand how you're computing a feature or even understanding that, like, you know, the golden dataset that maybe we train this model on is now just too old and we need to recollect the golden dataset and try again or something to that extent has really just kind of bitten us time and time again. Like, not actually knowing that our model is working as expected is just like an ever growing pain for us. Like, at any point in time, kind of without proper monitoring or proper tooling or proper testing, any given model we have is a ticking time bomb, and that has nothing to do with the fact that we used stochastic gradient descent with some particularly hypertuned parameter.
You know? It all has to do with how we've deployed it and how we've engineered the system to work. And I think, like, as I kind of have left school and come into a couple different teams now doing data, it's become more and more apparent to me that, like, the skills that I have is is in that realm of, like, building better systems, not necessarily in, like, designing better algorithms. And it turns out that that's a huge problem for a lot of organizations, so I'm quite lucky that I have the skill set. Absolutely.
[00:53:30] Unknown:
I can relate in that regard. Yeah. As far as the work that you've been doing at Riot Games and on the League Data team, what are some of the most interesting or informative mistakes that you've made, either personally or at a team level?
[00:53:44] Unknown:
Oh, I think for us, it's definitely always been around, like, experimentation. I think it wouldn't be surprising to me if in the near future, our team spends a lot more time trying to build tools to help us do experiments a bit easier and define what an experiment is. But, you know, like, launching a new service too soon could lead to, you know, corrupt data for, like, a month. So, like, now for the rest of time, all of your pipelines have to take into account the fact that you shipped the service poorly 1 month in August in 2020 or something. On the flip side of that, waiting too long to get in front of, you know, poorly playing players in Europe or or something. Not that Europe players play bad, but like just any particular area of the world, you know, taking too long to get in front of that and solve that from, you know, a data perspective could mean we lose players. It could mean that people, you know, change their reputation. It could mean that more toxic people show up to that shard and now, like, you know, we have Reddit posts kind of flaming us for days or whatever.
And that's just like a kind of ongoing problem that that we've never really took the time to address. So, like, I think for me and for a lot of our teammates, you know, you'll always hear us be like, oh, dang. Like, I wish I spent, like, 2 more weeks just thinking about this so we didn't have to deal with it or, like, you know, we spent way too long polishing the thing. And then by the time we shipped it, no 1 even cared anymore because all the hype was gone or just Fortnite beat us to it or whatever. So I think definitely for us, like, just that realm of, like, how do you balance kind of being the shop that everybody wants to be where you move fast and break things, but also, like, accepting the fact that, like, you know, if you mess up data for months, you're going to have to live with that for years.
Yeah. I think that's definitely, like, at least for me, the biggest learning, and I know a couple of the coworkers I talk with, that tends to be, like, a thing that we all kind of vent about at various times. Absolutely.
[00:55:58] Unknown:
And as you continue to build and iterate on the systems that you're supporting and work with the game developers and game engine teams, what are some of the things you have planned for the near to medium term of the data stack or any projects that you're particularly excited to dig into?
[00:56:14] Unknown:
We're really looking into, like, what does live inference mean in a game engine, and how can we kind of change the fact that, like, you know, we're afraid to get into the game engine because of network connectivity, because of network constraints or performance constraints. But if instead we treat those constraints as like just a way to build the system against what could we do, like, you know, is there a world where you know we embed LibTorch into the game and that actually is like not sufficiently difficult to ship or players can still use or what have you. That is definitely, like, an area, like, in game inference that we're kind of getting into that I'm really excited about. I think it's a bit less kind of the research y reinforcement bits and more just like how could we better personalize an experience, how could we tell somebody, hey, now's a good time to join a team fight or something like that. So I'm really kind of excited about that space. I'm also really excited about getting to, I guess, just kind of build more generalized tools for game developers to kinda have this stuff, like, out of the gate. As more and more games come out of the league engine, they have, like, a bigger reputation to kind of uphold, you know. So like if we build, you know, a system that really well describes the meta of TFT, like, you know, imagine some system that just really well defines it in terms of sentences so a game designer can, like, kind of take it into their world and kind of articulate it. A new game coming on the league engines game is like, hey. We want that. We want what TFT has because, like, you know, who else is making a competitive card game in the space or whatever?
So I think further generalizing our tools and, like, further generalizing these kind of ML systems, I'm really kind of excited about. Hopefully, you know, as the years go by, they become more and more robust, and I can talk about them again in, like, another fashion like this or something.
[00:58:18] Unknown:
Are there any other aspects of the work that you're doing on the league data team or the data challenges or applications that you're supporting that we didn't discuss yet that you'd like to cover before we close out the show? Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:58:45] Unknown:
You know, it's funny. I've been listening to the show for a while, so, like, I knew this question was coming. And I was like, oh, like, what is my answer gonna be? I feel like this is, you know, this is really the time to shine, but it's it's hilarious because I think the answer I have is I wish there were be it a tech or a piece of tool or be it just like a better course that kind of helps kind of explain database fundamentals in the world of machine learning. I feel like a lot of the challenges we end up solving if you go back far enough in database literature, you find the exact same problem, just phrase in business intelligence words.
So a better way to, like, go back and kind of uncover all that, like, research from database management systems, I really wish existed. I really wish I didn't have to, like, you know, read something about how, you know, Uber solved their shuffling problem and then say, like, you know, this sounds like a replication problem. Didn't MySQL solve this? And then sure enough, here it is in, you know, some 1980 text books sitting in the UC Davis library. I wish that didn't have to happen all the time, but, you know, we stand on the backs of giants as it is, and sometimes you don't know what the giants are. So And everything old is new again. Yeah. Everything old is new again. So, like, you know, I'm trying to take it upon myself to as I discover those things, like write blog posts or something just to say, like, hey. If you remember solving this in your organization, so did Postgres in 19 nineties. Here's how they did it. Let's see what we can rip from that and and just kind of dress up.
[01:00:23] Unknown:
Absolutely. Well, thank you very much for taking the time today to join me and share the work that you've been doing on the League Data team and some of the challenges that you're facing and some of the solutions that you're building around that. It's definitely a very interesting problem space, so I appreciate you taking the time to share that, and I hope you enjoy the rest of your day. You too, man. I appreciate you having me on the show.
[01:00:48] Unknown:
Thank you for listening. Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com. Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction to Ian Schwier and His Role at Riot Games
Overview of League of Legends and Data's Role
Characteristics and Collection of Game Data
Challenges in Data Management and Legacy Systems
Constraints and Architecture of Data Collection
Unique Data Generation and Quality Control
Team Collaboration and Data Integration
Current Platform Architecture and Tools
Model Constraints and Delivery in Game
Onboarding and Reducing Overhead
Innovative Uses of Data at Riot Games
Lessons Learned and Challenges Faced
Future Plans and Exciting Projects
Closing Thoughts and Contact Information