Summary
There are a number of platforms available for object storage, including self-managed open source projects. But what goes on behind the scenes of the companies that run these systems at scale so you don’t have to? In this episode Will Smith shares the journey that he and his team at Linode recently completed to bring a fast and reliable S3 compatible object storage to production for your benefit. He discusses the challenges of running object storage for public usage, some of the interesting ways that it was stress tested internally, and the lessons that he learned along the way.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
- Your host is Tobias Macey and today I’m interviewing Will Smith about his work on building object storage for the Linode cloud platform
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by giving an overview of the current state of your object storage product?
- What was the motivating factor for building and managing your own object storage system rather than building an integration with another offering such as Wasabi or Backblaze?
- What is the scale and scope of usage that you had to design for?
- Can you describe how your platform is implemented?
- What was your criteria for deciding whether to use an available platform such as Ceph or MinIO vs building your own from scratch?
- How have your initial assumptions about the operability and maintainability of your installation been challenged or updated since it has been released to the public?
- What have been the biggest challenges that you have faced in designing and deploying a system that can meet the scale and reliability requirements of Linode?
- What are the most important capabilities for the underlying hardware that you are running on?
- What supporting systems and tools are you using to manage the availability and durability of your object storage?
- How did you approach the rollout of Linode’s object storage to gain the confidence that you needed to feel comfortable with full scale usage?
- What are some of the benefits that you have gained internally at Linode from having an object storage system available to your product teams?
- What are your thoughts on the state of the S3 API as a de facto standard for object storage?
- What is your main focus now that object storage is being rolled out to more data centers?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- Linode Object Storage
- Xen Hypervisor
- KVM (Linux Kernel Virtual Machine)
- Linode API V4
- Ceph Distributed Filesystem
- Wasabi
- Backblaze
- MinIO
- CERN Ceph Scaling Paper
- RADOS Gateway
- OpenResty
- Lua
- Prometheus
- Linode Managed Kubernetes
- Ceph Swift Protocol
- Ceph Bug Tracker
- Linode Dashboard Application Source Code
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends over at Linode. With 200 gigabit private networking, scalable shared block storage, and a 40 gigabit public network, you've got everything you need to run a fast, reliable, and bulletproof data platform. If you need global distribution, they've got that covered too with worldwide data centers, including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances, and they've got GPU instances as well.
Go to data engineering podcast.com/linode, that's l I n o d e, today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy. And today, I'm interviewing Will Smith about his work on building object storage for the Linode cloud platform. So Will, can you start by introducing yourself?
[00:01:10] Unknown:
Yeah. Hi. I've been with Linode for about 5 years now. I originally worked on our transition from the Zen hypervisor to KVM, which was a super exciting project. I since then moved on to working on the new API that we launched 2 years ago. And after that, I was moved on to the object storage project. And do you remember how you first got involved in the area of data management? Well, Linode does a lot of data management. What with, hosting virtual private servers and everything. But specifically with object storage, they already had a prototype developed of, an object storage cluster based on Seth.
And they just wanted to bring a team of developers into productize that,
[00:01:51] Unknown:
which is kinda where I came in. And so you mentioned that you'd started off with this seth prototype. I'm wondering if you can just give a bit of an overview of the current state of what you have available for object storage on the Linode platform storage as storage as an offering such as Wasabi or Backblaze?
[00:02:18] Unknown:
Absolutely. Right now, our object storage service is available in Newark and Frankfurt, and we've got more locations planned. We offer a fully s 3 compatible API, and that means that it can plug into basically any tool or service that already uses object storage, which is fantastic. We also offer static site hosting on the platform. It's got full integration with all of Linode's first party tools. And there's a promotion going on right now where we're giving it away for free until May. So if you haven't used it, you should check it out. As for why we built this out ourselves instead of partnering with someone, we had internal uses for this, and we wanted to integrate it into our platform so that we could use it however we wanted and saw fit without any restrictions.
We also already had a lot of organizational expertise in hosting CEF because it's what powers our block storage product.
[00:03:08] Unknown:
So it just made sense that we already ship hardware to data centers all the time. We would stand up clusters ourself and integrate it into our platform for customers in addition for our to for our own use. And so given the fact that you are already using SEF for the block storage, it seems pretty obvious that you would end up using it for the object storage capabilities as well. I'm wondering if you can just give a bit of a discussion of
[00:03:32] Unknown:
some of the trade offs of using Ceph for the object storage as opposed to something like MinIO or 1 of the other available projects, but just building it out yourself from scratch? Well, building it out from scratch was never on the table because we wanted a much faster time to market than that would allow. We didn't really consider alternatives like MinIO because we already had a team that knew how Ceph worked and how to administrate it pretty well. So once they set up the object storage gateways on a new Seth cluster and saw that it all worked pretty much how they expected, it was a clear choice that we should use it. And in terms of the scale that you're building for and the scope of usage that you had to design for. I'm wondering how that impacted
[00:04:13] Unknown:
the overall design and testing and implementation of the rollout of the object storage capabilities.
[00:04:20] Unknown:
Absolutely. So we wanted Object Storage to be a Cloud primitive available with Linode. So when you sign up for an account with our platform, it's just another tool in your tool belt. And that means that it needed to support a broad swath of use cases. Everything from just hosting user uploaded files to long term data storage or big data with a lot of movement and IO to static site hosting that might let new people join the platform without necessarily needing to spin up and maintain a server. But that presents an interesting problem, which is that as a consumer of object storage, it appears to you as if the buckets you create have unlimited space for you to upload objects.
When in reality, there is actually a physical server that only has so much space to store things. So a big part of the initial effort of the project was trying to figure out how we could scale the clusters to keep up with demand, and we played with a whole lot of things. We played with scaling Ceph the way that they recommend you do so, And there's a a big paper that CERN published in alongside Ceph that claims to have scaled it up to thousands of nodes and hundreds of petabytes of data. We played with crazier things, like scaling multiple ceph clusters together to appear as 1 cluster to a customer, which was fun and exciting. But, ultimately, the problem is pretty much the same as just maintaining availability for new servers in a data center. When a customer requests a virtual server, we need to make sure that we have 1 to give them all the time. So this is just a different version of that problem that we already work with all the time. And in terms of maintaining
[00:05:54] Unknown:
compatibility with the s 3 API, I know that Seth out of the box has that capability. But what are some of the issues or challenges that you see as far as being able to keep up with some of the recent additions to the s 3 API for things like the, s 3 select API or anything like that? I think for now, we don't intend to support anything that isn't supported by the Ceph system. Their project is very active, and they do add new layers
[00:06:20] Unknown:
compatibility all the time. So any work that we spent trying to implement that on top of their system would probably eventually be redundant anyway. And so as far
[00:06:30] Unknown:
as maintaining the installation, have you had to do any customization of Ceph itself to be able to support the scale that you're building out? And has that posed any challenges of being able to stay up to date with the latest releases?
[00:06:43] Unknown:
We are running our own custom build of Ceph. Most of the patches applied to it were just plucked from upstream to fix bugs we encountered. And we encountered some pretty interesting and surprising bugs during our rollout. Like, when we initially launched the cluster internally to test, bucket name validation wasn't working. So you could create buckets that weren't valid domain names, which broke the DNS that we wanted to have for all buckets. Or, we found an ACL violation issue, which was pretty severe and let you modify other people's resources if you knew just what to do, which had been fixed on the master branch but not ported to a release yet. And as far as
[00:07:24] Unknown:
the initial assumptions that you had when you were first getting involved with this project about the operability and maintainability of Ceph as the underlying layer for object storage. And then as you have progressed through the various levels of testing and release and now and now the general release, how have those assumptions been challenged or updated?
[00:07:44] Unknown:
Yeah. I mean, we started with a pretty good understanding of how Ceph works at scale because the block storage system was rolled out many years ago. So we had a pretty good idea of how to build out a cluster and keep an eye on it and monitor it for health and problems. The biggest new component was the redos gateways, which we had not worked with before. So if you're not familiar with the architecture of Ceph, at the heart of it is a system called Rados, which is the reliable, autonomous, Durable Object Store. And it basically has 3 front ends that they provide. Cephs, which just lets you mount a chunk of Rado's storage as a file system.
The block storage front end, which we use to power our block storage product. And then the Radoos Gateway, which is the s 3 compatible API that lets you access the underlying Radoos storage in that manner. So we've already had clusters and maintained them for a long time, but the Redos gateways were a totally new piece of infrastructure for us, and they presented their own challenges, both in figuring out the right number to have to get the kind of response times that we wanted without getting them clogged up, and how to route traffic efficiently to them. In order to have static sites in Ceph, you need a separate gateway that serves HTML instead of s 3, which is an interesting design decision and presents a unique challenge.
And that as traffic comes in, we need to decide which of the 2 sets of gateways we want to send it to. We solve this problem and many others by having a smart proxy sit in front of Ceph, facing the Internet. And in order for it to work the way we wanted, it needed to be scriptable enough to examine incoming traffic and decide where it had ought to be routed to. But it also needed to be fast enough that we didn't incur a huge amount of overhead on every request coming into object storage. So while I initially prototype something in Python, it was way too slow. It added, like, hundreds of milliseconds per request to everything that you did, and that's just unacceptable.
So after doing a lot of research and playing with several things, we landed on a piece of software called OpenResty. And, if you're not familiar, OpenResty is basically just NGINX, but with a bunch of plugins compiled in. Modules in NGINX are statically linked, so you have to add them at compile time. Although you could build NGINX with these modules yourself or with any selection you wanted, this distribution is very nice because it comes with everything packaged together in a state that is tested and works and is used throughout the industry. And the most powerful piece of that is the NGINX Lua module, which allows you to execute any Lua code that you write within an NGINX request context.
In doing that, we were able to have requests come in and decide that this is looking to talk to the HTML gateways or to the s 3 gateways and route it with very, very little overhead per request. It also solved other problems we had like it also solved other problems we had, like enforcing rate limits per bucket in addition to per remote IP address to prevent someone from trying to take a single bucket offline through too much traffic or monitoring usage since many of the clients are gonna be talking directly to s 3 instead of going through any other system of ours. We need a very low latency to capture that traffic and make sure that we knew what was going on. And the quota
[00:11:16] Unknown:
limits is another interesting challenge that you're facing that isn't something that somebody running their own object storage system on set for men. Io would really have to deal with unless they're in some sort of enterprise context. And so I'm wondering what you're using for being able to handle that metering and quota limits for people who are building projects that are leveraging your object storage. Well, we do use Ceph's quota system to keep track of and
[00:11:44] Unknown:
add caps to usage. But our approach has generally been that if you're a regular user trying to do normal non abusive things, you should have the availability ready for you. And unless you are using it at an exceptional scale, you shouldn't know that there's any cap. But if you are using in an exceptional scale for a valid use case, all of those knobs are tunable by the support team. So opening a ticket when you hit the quota is probably enough to get it lifted to a point where you could do whatever you want. And you mentioned that you had to build this smart proxy for being able to handle the appropriate routing for the native object storage versus HTML requests,
[00:12:27] Unknown:
and you're using the quota limits for handling the storage capacities. But I'm wondering what other supporting systems or additional tools you've had to build around the object storage solution to be able to manage the availability and durability of the project, as well as any ongoing configuration maintenance and deployability of it? Absolutely.
[00:12:49] Unknown:
So we had to build an internal, like, very back end service, which basically administers Ceph clusters for us. And it handles creating credentials on demand for new customers and tracking usage through SEF and, you know, all of the reporting that we need to that end. A lot of the the monitoring and making sure that service was durable, we could reuse existing systems for because we had already been maintaining set for a long time. But the administration of, specifically, the s 3 component was entirely new. And our design goal of that was to have a centralized system that could handle keeping all of the clusters in sync with what we think things should be, while allowing the clusters themselves to be the authority on what data is stored within them so that we don't have to keep an authoritative real time record of who owns what, where.
Because that would be just a huge, enormous, and maybe impossible to solve problem. Ceph itself is eventually consistent. And so our tracking of who has what data in what cluster will eventually be right. But if we need to know right now what's there, we want Seth to be the 1 to tell you because it's really the only piece of the puzzle that can know it. And as you were determining
[00:14:13] Unknown:
the deployment of the object storage, given that you already had the CEP deployments for block storage, were there any different considerations that needed to be made for the hardware that it was getting deployed to? Or is everything just homogeneous
[00:14:29] Unknown:
across the different block and object storage supporting infrastructure? Well, I didn't have a very big role in designing the hardware, but I do believe it is a different build for object storage. From what I understand, object storage, while it also wants very high storage density, is more sensitive to running out of memory than block storages. So we needed to make sure that these servers had enough memory to do what they needed to do to handle the load of storing,
[00:14:57] Unknown:
these arbitrary sized often smaller chunks of data compared to the large volumes allocated in the block storage system. Yeah. And I'm sure that the request workloads are also a lot more bursty than they would be for block storage, whereas you said, it'll be a lot of small requests where potentially small files or fragments of files versus,
[00:15:16] Unknown:
most likely more long running jobs on block storage that are going to be piping larger volumes of data through it? Well, really, the biggest difference there is that the object storage system talks to the public Internet just through our proxy. Whereas, the block storage system gives you volumes that you can mount to a single server. So when you're using block storage, you've already got a server running that is the only thing talking to this volume. And whether or not you're doing heavy IO, we're not getting requests for that same piece of data from, like, a 1, 000 different places at once. In object storage, since it's on the Internet and you could be updating data while someone else is retrieving it, or you could be hosting images that are on a big popular website and getting requests from thousands of IPs at once, you do have a substantial difference in traffic.
That was 1 of the reasons that we were looking so hard into rate limiting to make sure that no individual thing used so many resources that it overwhelmed the server. While rate limiting per client IP is important, by also rate limiting by the bucket that you were making requests to, we could make sure that if 1 piece of media somewhere exploded in popularity, those requests would still be seen and throttled or blocked to keep the cluster on mind. But the goal with rate limiting, of course, is just to protect the overall system and not to limit the usefulness of it. So the limits are quite high, and I don't think that they've ever really been hit in practice. And so as far as the actual rollout of the product to general availability,
[00:16:55] Unknown:
what was the process that you went through to ensure that you could have the necessary confidence to know that it would be reliable enough for people to be able to use it at scale
[00:17:07] Unknown:
and being able to feel comfortable with sleeping at night knowing that it's out and being used by the general public? Oh, yes. I I wanted to make sure I could sleep at night after we turned this thing on. We used a very agile process for the development of this product. So, when I got brought onto the team and we earnestly started turning this from a proof of concept cluster into an actual product available for people, it took, I think, about a month before we had an internal alpha available within the company. And that was available to all of the employees. We told them do whatever you want. Go nuts. We might break it because we're still learning things about how this works as a product. But it was an incredibly powerful tool for giving us confidence and revealing the problems that customers would end up facing. We had 1 person in the company stand up a Prometheus cluster, which used the alpha as a back end for its data storage, which saw our IO go through the roof, like, pretty much instantly because Prometheus is constantly pulling data in, and then it's taking it back out and mutating it and putting it back in. And it was a kind of workload that we just couldn't have produced on our own without, like, a concentrated effort. But this employee already had Prometheus stood up for something, and he wanted to see if object storage was a good back end for it and also give us a little bit of traffic, which was super eye opening. It helped us to tune Seth for the workloads that we would actually expected to see. And it helped us to decide how to write the administration part of the system that would need to be sensitive to these kinds of workloads and permit them because it was a real, legitimate, valid use, but it was just a lot of it.
After the alpha lasted for a while and we made some big changes to the system during that time from the things that we learned, We went ahead and did a closed beta. And that was active for maybe 3 or 4 months, during which time a select subset of customers that had expressed interest in the product were given access to it and asked to put their real workloads onto it as much as they wanted to put them on a beta. And that too was very eye opening, less in tuning the system because at that point, we had it pretty much where we wanted it to be. But more in terms of customer expectations and what features they would want that we might not have foreseen. And again, a lot of changes went through the system during the beta as we learned the way the customers were going to use it.
What kind of things they expected to see, and what deficiencies we had in seeing what people were doing in the system. Because it's a lot different when you just got a bunch of customers that are doing whatever they want as opposed to a bunch of people who you work with that you can send a message to and say, hey. What what are you doing? You know? And for workloads that are more data intensive
[00:20:03] Unknown:
where you want to ensure that there's a high degree of durability of the underlying information, I know that you have support for versioning of the objects within the buckets. But I'm curious if you have any support for being able to do replication of data across different regions or different zones for being able to ensure that you have multiple copies? Or is that something that would be pushed to the client application to ensure that, replication of information? Well, presently, the data is replicated within the cluster several times.
[00:20:34] Unknown:
There would have to be a pretty catastrophic failure for us to lose data that's stored in ceph. But we do not yet have support for replicating that data to other clusters. That's something that we spend a decent chunk of time looking into. We decided to put it off until we knew just how much would be consumed by just regular usage in 1 cluster before we committed extra space to replicating that data around the world
[00:21:02] Unknown:
and not knowing just how much space we were gonna need. And also, it depends largely on the use case whether having it replicated globally would be useful at the object storage level versus just making it available through a CDN. Because if it's somebody who's running a website, they could use something such as Fastly or CloudFront or anything like that or CloudFlare for being able to replicate that information to their users. Whereas, if it's somebody who's doing a lot of data processing where they might have disparate clusters in different offices across the world and they want to be able to get access to that underlying data for analytics, that's where it's more useful for them to be able to actually have that data replicated to the different regions where they might wanna access it. Right. I mean, in some ways, if you replicate the data across the world, you are creating a CDN
[00:21:51] Unknown:
or at least most of it.
[00:21:53] Unknown:
Fair point. And so in terms of the benefits and new use cases that you've been able to enable internally at Linode,
[00:22:04] Unknown:
what are some of the more interesting or exciting capabilities that have come about from that? Oh, absolutely. Having object storage available has been super exciting within the company. And as soon as we opened up that internal alpha, people just started using it. And it was amazing to see the things that they came up with. Because, personally, I had things that I wanted it for, but I wouldn't have thought of the things that everyone else used it for. We've seen applications be made stateless in relatively simple ways by just using object storage to store information that would have otherwise had to reside on a server, which has massively improved the rollout process for those applications because the infrastructure themselves are no longer important. They're no longer special. You can just kill them and make a new 1 and it's fine. We've seen it used as a storage back end for systems that just plugged into object storage, which again made them much easier to store the data. They've got this Object Storage that we already had and it's already going to be maintained.
We've seen it used as intermediary steps in pipelines, so as a convenient place that 1 pipeline could deposit data that something downstream would need. It's been a real game changer to have around, and I'm very glad that we did it for our own internal purposes.
[00:23:27] Unknown:
And I'm sure that having it available as well as you're deploying the managed Kubernetes service has been beneficial as well because of the fact that a lot of cloud native workloads leverage object storage as a backing store rather than relying on persistent disk on the underlying infrastructure that they're running on? Absolutely.
[00:23:46] Unknown:
The LKE service has built in support for our block storage product. So if you need persistent disk storage, it's available to your Kubernetes clusters on our platform. But object storage can be much more useful when you don't want to have to mount a disk or worry about any of those details. You just have data that you need to put somewhere
[00:24:07] Unknown:
and know that it's gonna be there when you wanna retrieve it later. And as far as the s 3 API itself, I'm curious what your thoughts are on its state as the de facto standard for object storage. Have you found it to be at all limiting? Or do you think that it's beneficial in general that there is this 1 standard that everyone is coalesced around? Well, I'll start off by saying that the s 3 API is obviously very good. It is ubiquitous.
[00:24:35] Unknown:
It has excellent tooling. It plugs right into an amazing number of off the shelf things. And it makes it very easy to work with. But when developing this product specifically, I had to work with it at a much lower level. And when you actually start making calls to s 3 directly instead of using some tool or library to do it, it becomes very apparent the design limitations of the API. For instance, it's often not easy to compile all the information you want about an object or a bucket from just 1 API call. You have to, as an example, to fetch the ACLs of a bucket, you need to make a separate call.
And to fetch the permissions for other users, you have to make a separate call. And so it can be easy to build up a whole bunch of calls to s 3 that you want just to compile 1 piece of useful information. Additionally, it's got some features that while very powerful, can also be very hidden, like the bucket version that you mentioned earlier, which has some very strange bits of behavior in that, the prior versions won't be obvious when you are listing objects in most ways because they're not returned from the regular s 3 endpoint. And you have to make special calls to find them. But you can't delete a bucket that isn't empty. So if you want to remove a bucket that appears to be empty, you might not be able to because it's got prior versions.
And disabling versioning isn't enough to get rid of them in all cases. You often need to configure a lifecycle policy and then wait until they clear themselves out. We also had 1 customer during the beta period who was very confused about the usage in 1 of their buckets because we were reporting that they were using a huge amount of data. And from their perspective, it looked like they were using very little. And after working with the customer and looking at Ceph and trying to debug it, we found that it was actually all hidden in multipart upload metadata. So S3 gives you, I think, 5 megabytes per file per upload. And if the file you need to upload is bigger than that, you have to upload it in multiple chunks.
Most clients handle this for you seamlessly and it works great. But the client library they were using, it was actually a Minio client in Java, was not aborting multipart uploads that failed in the correct way. So the metadata was staying in their bucket. But because you have to make several very strange API calls to see what multipart upload metadata is in a bucket, it wasn't obvious to them where this extra usage was coming from, which I think speaks to that while the s 3 API is fantastic and it's very widely supported, it seems to have grown organically as new features were added to s 3. And that's good in some ways. You have a very well supported and maintained thing that a lot of people know how to use. But at the same time, it makes it so that if you're just coming into the game and looking at it, you might look at it and say, I have no idea what's going on here. Do I think that it should be standardized and redone? Probably not. I mean, at this point, it's so widely supported that it would be terribly disruptive to the entire object storage ecosystem if there was just a new standard way to access it that everything had to support. And it probably wouldn't work. But it is important, especially if you're working directly with s 3,
[00:27:58] Unknown:
to really read the fine print because there are some gotchas in there. Yeah. That that that's definitely true. And anybody who's used it for long enough, I'm sure will have come across at least 1 or 2 of them. Oh, I don't doubt it. And I also believe that ceph itself, while it does have s 3 compatibility for its object storage, it also has a different protocol that it implemented at least initially, I think called Swift. And I'm wondering if you have any plans to expose that alternative interface for people who want to be able to leverage object storage using some of the clients built for that. They do have built in Swift support. We played with it for a little bit. It was not as widely supported in tools.
[00:28:38] Unknown:
And for that reason, mostly, we didn't consider it as important to implement. And so far, I don't think we've gotten any feedback from any customer asking for us to turn it on. So I don't think we plan to at this time. And now that
[00:28:51] Unknown:
the object storage product has hit general availability and you're deploying it to more of your data centers, what are some of the future plans that you have for it, either in terms of new capabilities
[00:29:03] Unknown:
or new internal tooling or support? Absolutely. For starters, we want to roll it out to more data centers and make it more widely available. I can't speak to where we're going next, but we're definitely going more places. We also wanna add some exciting new features like letting you configure SSL per bucket. This is something that that OpenResty proxy is gonna be great for because it's in the perfect position to terminate SSL and it is plenty dynamic to figure out what certificate you'd uploaded for a bucket. And of course, we always have work to do on the back end to make it more robust and work more with the things that we find emerging as we deploy the product to more places.
[00:29:44] Unknown:
And you mentioned that you've been involved in a number of different projects since you started at Linode and this is just the object storage has hit general availability. Is this something that you're going to stick
[00:30:00] Unknown:
for a while? Or do you have something new on the horizon that you're planning to get engaged with? Well, I'm certainly still keeping an eye on it and supporting the team as best I can. But I have been moved off off to another project. And, I don't think that I could say what it is yet.
[00:30:12] Unknown:
Fair enough. I don't think we've announced it anywhere. So Alright. And so as far as your experience of building out this object storage platform and releasing it publicly, what have you found to be some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:30:29] Unknown:
Well, definitely the most unexpected lesson was to keep an eye on Ceph's bug tracker. They're very good at reporting and fixing issues, but they don't always back port them to releases. And their release cycle is often slower than how fast we want bugs to be fixed in our clusters. So, their bug tracker is massive. If you ever looked at it, it's a very big and very long running project. But since the project is open source and they track everything so well, it's pretty easy to, once you've got a handle on what's going on with it, take the patches they put up and compile them into our versions of Ceph and make sure that our customers aren't affected by the bugs that are found upstream. I I certainly wasn't expecting that when we went to shipping this because, largely, that wasn't our experience with the block storage product. And are there any other aspects
[00:31:19] Unknown:
of object storage or your work on the Linode product or anything about your experiences of getting it deployed that we didn't discuss that you'd like to cover before we close out the show?
[00:31:30] Unknown:
Yeah. I would just like to put a point on how useful of a product it is and how many possible applications that we've found just internally. As another example of a cool thing we've done with this, the our front end application is a single page React app. And every time they put up a PR to it, which is it's open source. It's on GitHub. So every time they put up a PR, our pipeline builds that code and uploads it to object storage so that it's accessible immediately. And not only can the team that's working on it see it, but the other teams that are related to that project that they're making front end changes for can immediately see and use that code. And it's it's, been a very powerful thing to improve how fast we're able to ship things and how much confidence we can have in front end changes and and how much testing we can do with them. So it's just 1 more example of something that we found to do with it that is very exciting
[00:32:29] Unknown:
and that I hadn't even considered when we started the project. Alright. Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I would like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:32:47] Unknown:
The most frequent question I've heard from people, both internal and external is, how do I tell what's really going on in here? Right? The nature of object storage is that you want to upload data to it, and a lot of it, and constantly. But unless you know up front how long you're going to need to retain that data for and if all of your data can be classified as this is how long I need it. The life cycle policies that s 3 makes available aren't really useful to you because they they'll say, okay, well, this is gonna expire after a week. But if you have data that is gonna need to be around for a couple months and some other data that's only need to be around for a couple days, it can be very hard to manage.
And what's worse, it can be very easy to not see that any of that data is sitting there until you end up looking at the entire bucket and say, wow, this is huge. And then you're left with the unenviable task of having a huge pile of data that can be very hard to figure out what of it is important and necessary, and what of it is just an artifact of something that should be gotten rid of? I think the biggest gap in tooling is something that would solve that problem that would help look at the data that you have in an object storage bucket and say, oh, well, you've actually accessed this this recently and this is accessed all the time and this has been sitting here for a year and a half and you don't need it. To help people actually manage their data without having it just have it accumulate forever
[00:34:20] Unknown:
and build technical debt. Yeah. It's definitely a big problem and 1 that people are trying to attack with things like data catalogs and data data discovery services, but it's still an imperfect art and there's certainly room for improvement and probably ones that can be more specifically targeted to object storage as you mentioned as well. So, definitely something worth exploring further. So there's certainly not a silver bullet for it yet. Absolutely. Well, thank you very much for taking the time today to join me and share your experience of building out the epic storage platform at Linode. And, it's definitely a useful service and 1 that I'm using myself. So thank you for all of your time and effort on that front, and I hope you enjoy the rest of your day. Thanks for having me.
Listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com to learn about the Python language, its community, and the innovative ways it is being used. And visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave your view on Itunes and tell your friends and coworkers.
Introduction and Guest Introduction
Will Smith's Journey at Linode
Involvement in Data Management
Overview of Linode's Object Storage
Choosing Ceph for Object Storage
Design and Implementation Challenges
Maintaining S3 Compatibility
Customizing Ceph for Scale
Smart Proxy and Traffic Routing
Quota Limits and Metering
Supporting Systems and Tools
Hardware Considerations
Rate Limiting and Traffic Management
Ensuring Reliability and Scalability
Replication and Data Durability
Internal Use Cases and Benefits
Kubernetes and Object Storage
S3 API as a Standard
Future Plans for Object Storage
Lessons Learned
Usefulness and Applications
Closing Remarks