Summary
Kubernetes is a driving force in the renaissance around deploying and running applications. However, managing the database layer is still a separate concern. The KubeDB project was created as a way of providing a simple mechanism for running your storage system in the same platform as your application. In this episode Tamal Saha explains how the KubeDB project got started, why you might want to run your database with Kubernetes, and how to get started. He also covers some of the challenges of managing stateful services in Kubernetes and how the fast pace of the community has contributed to the evolution of KubeDB. If you are at any stage of a Kubernetes implementation, or just thinking about it, this is definitely worth a listen to get some perspective on how to leverage it for your entire application stack.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- Alluxio is an open source, distributed data orchestration layer that makes it easier to scale your compute and your storage independently. By transparently pulling data from underlying silos, Alluxio unlocks the value of your data and allows for modern computation-intensive workloads to become truly elastic and flexible for the cloud. With Alluxio, companies like Barclays, JD.com, Tencent, and Two Sigma can manage data efficiently, accelerate business analytics, and ease the adoption of any cloud. Go to dataengineeringpodcast.com/alluxio today to learn more and thank them for their support.
- Understanding how your customers are using your product is critical for businesses of any size. To make it easier for startups to focus on delivering useful features Segment offers a flexible and reliable data infrastructure for your customer analytics and custom events. You only need to maintain one integration to instrument your code and get a future-proof way to send data to over 250 services with the flip of a switch. Not only does it free up your engineers’ time, it lets your business users decide what data they want where. Go to dataengineeringpodcast.com/segmentio today to sign up for their startup plan and get $25,000 in Segment credits and $1 million in free software from marketing and analytics companies like AWS, Google, and Intercom. On top of that you’ll get access to Analytics Academy for the educational resources you need to become an expert in data analytics for measuring product-market fit.
- You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
- Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
- Your host is Tobias Macey and today I’m interviewing Tamal Saha about KubeDB, a project focused on making running production-grade databases easy on Kubernetes
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by explaining what KubeDB is and how the project got started?
- What are the main challenges associated with running a stateful system on top of Kubernetes?
- Why would someone want to run their database on a container platform rather than on a dedicated instance or with a hosted service?
- Can you describe how KubeDB is implemented and how that has evolved since you first started working on it?
- Can you talk through how KubeDB simplifies the process of deploying and maintaining databases?
- What is involved in adding support for a new database?
- How do the requirements change for systems that are natively clustered?
- How does KubeDB help with maintenance processes around upgrading existing databases to newer versions?
- How does the work that you are doing on KubeDB compare to what is available in StorageOS?
- Are there any other projects that are targeting similar goals?
- What have you found to be the most interesting/challenging/unexpected aspects of building KubeDB?
- What do you have planned for the future of the project?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- KubeDB
- AppsCode
- Kubernetes
- Kubernetes CRD (Custom Resource Definition)
- Kubernetes Operator
- Kubernetes Stateful Sets
- PostgreSQL
- Hashicorp Vault
- Redis
- Elasticsearch
- MySQL
- Memcached
- MongoDB
- Docker
- Rook Storage Orchestration for Kubernetes
- Ceph
- EBS
- StorageOS
- GlusterFS
- OpenEBS
- CloudFoundry
- AppsCode Service Broker
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. When you're ready to build your next pipeline or want to test out the project you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With 200 gigabit private networking, scalable shared block storage, and a 40 gigabit public network, you've got everything you need to run a fast, reliable, and bulletproof data platform. And if you need global distribution, they've got that covered too with worldwide data centers, including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances.
Go to data engineering podcast.com/linode, that's l I n o d e, today to get a $20 credit and launch a new server in under a minute. Eluxio is an open source Eluxio is an open source distributed data orchestration layer that makes it easier to scale your compute and your storage independently. By transparently pulling data from underlying silos, Eluxio unlocks the value of your data and allows for modern computation intensive workloads to become truly elastic with the cloud. With Aluxio, companies like Barclays, jd.com, Tencent, and 2 Sigma can manage data efficiently, accelerate business analytics, and ease the adoption of any cloud.
Go to data engineering podcast.com/aluxio, that's a l l u x I o, today to learn more and to thank them for their support. And understanding how your customers are using your product is critical for businesses of any size. To make it easier for start ups to focus on delivering useful features, Segment offers a flexible and reliable data infrastructure for your customer analytics and custom events. You only need to maintain 1 integration to instrument your code and get a future proof way to send data to over 250 services with the flip of a switch. Not only does it free up your engineers' time, it lets your business users decide what data they want where.
Go to data engineering podcast.com /segmenti0 today to sign up for their start up plan and get $25,000 in segment credits and 1,000,000 dollars in free software for marketing and analytics companies like AWS, Google, and Intercom. On top of that, you'll get access to the Analytics Academy for the educational resources you need to become an expert in data analytics for measuring product market fit. And you listen to this show to learn and stay up to date with what's happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season.
We have partnered with organizations such as O'Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and to take advantage of our partner discounts when you register. And go to data engineering podcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. And please help other people find the show by leaving a review on Itunes and telling your friends and coworkers.
[00:03:04] Unknown:
Your host is Tobias Macy. And today, I'm interviewing Tamal Sahar about CubeDB, The project focused on making running production grade databases easy on Kubernetes. So, Tamal, could you start by introducing yourself?
[00:03:16] Unknown:
Hey. I'm Tamal. Glad to talk to you today. Yeah. So I'm a software engineer by profession, and the founder of the company called AppsCode, who's the primary sponsor behind the project QTB. So I was born and raised in Bangladesh. I came to US as a PhD student. So after finishing my masters, I did, spend my time working at Amazon for a short period, and then I worked at Google for around 3 and a half years. And then I started the company around 2 and a half years ago. So that's sort of short background on myself.
[00:03:54] Unknown:
And do you remember how you first got involved in the area of data management?
[00:03:58] Unknown:
Yeah. So in, when we started the company, we were trying to do something we would call, like, a integrated development platform. You can think of it as like a, you know, like a Google Apps for developer, tooling. Like, you know, if you sign up for Google Apps, like, you like, you know, with a single sign up, you'll get your Gmail, Drive, like, you know, calendar, all kinds of applications. So if you think of doing software development, like, you need something like a GitHub, then some sort of CICD, maybe Jenkins, and then some way to deploy that application and then, like, have a monitoring and all that. And we wanted to build a platform that kind of do all those things. Sort of as a like a maybe take some of the existing open source stuff and kind of glue them together as a platform. So we started doing that, and, we were trying to build that platform on top of Kubernetes in, like, in 2016.
And after spending quite a bit of time trying to do that, we came to realize that there are, like, a lot of, you know, limitations or sort of, you can say gaps in what the upstream Kubernetes project provides. And 1 of the key areas was the data management. Like, 1 of the things that we noticed is once you start running some of your applications on Kubernetes, it is much easier if you can run everything on Kubernetes. Because then, otherwise, you have to kind of maintaining 2 separate stacks, especially if it is in sort of a, you know, like, greenfield greenfield project that you are developing. You don't want to have multiple stacks. So we started, using Kubernetes and and then we found that, like, if you, you know, if you 1 of the key areas is, like, if you want to run a database or if you have a some sort of system or, like, for example, like, for Jenkins where you store data on your local disk, how do you back up back up back that up, like, how you, you know, make sure the database fails over and send all that.
And as sort of, we started, like, Kubernetes space and started doing, like, you know, building some tooling around, just managing these, you know, databases on Kubernetes just for ourselves. And that's how we got started. And then at some point, we open sourced it in, in early 2017. And and since then, we have been, you know, seeing a lot of, like, interest from people in the community, in the open source community, and we've been kind of continuing to develop that as quantity develops.
[00:06:25] Unknown:
And so you've given a bit of an explanation about what KubeDB is and how it got started. And I'm wondering if you can talk a bit more about the main challenges that exist when trying to run stateful systems on top of Kubernetes.
[00:06:39] Unknown:
Yeah. So, you know, the 1 of the interesting challenges with Kubernetes is that it's a very dynamic environment. Like, if you think about a pre Kubernetes work, like, you know, if you had a small setup, you probably would just deploying manually, maybe with some shell scripts and things like that. And if you're a large deployment, you would probably use some sort of configuration management software like Chef, Puppet, to essentially bring up VMs with software stack pre installed. And the the general expectations is that those VMs will keep running, as long as unless you as a, like, you know, the SRE or the DevOps for the system decide to move things over.
Even if the cloud provider or if you are on a on prem, I suppose, you know, you're whatever system you are using to manage, like, an open stack, you know, if you have a issue, the system will go down, but it'll stay there and then you have to do the manual move, recovery. Or or in case of cloud provider, it can move out to a different VM, but it stays there on that's like, on the data and the local disk and the data stays there. Now in the what happens in the Kubernetes world is that when, you are running these containers, which is called pods, which is essentially like a collection of containers with the same same name space. So what happens is that if something goes down, command is intelligent. It is a self healing system.
What that means that it will go and basically move that port onto a different node. And then if you are on a, you know, if you have the on a cloud provider where the storage can be moved, it will actually move the storage and restart the port, which is great because if you have a stateless system, then this is all you really need because your, you know, your your port moves over, it restarts, you're good. But if you're in a stateful system, 1 of the challenge is that, most of the stateful systems, are not sort of designed to work in this kind of really dynamic environment.
So 1 of the things that happens in this dynamic environment is that the IP address changes. When the Kubernetes port restarts on a different node or even on the same node, it is not going to have the same IP. Many of the existing, like, databases, like, think of Redis or Postgres, they use the IP address of the the host or the port, in this case, and in in case of quantities, as the identity of that, you know, replica of that database. So what do you need to do there, like, if you are running a stateful application, even if you use Kubernetes as a concept called a stateful set, which guarantees a static host name, but the IP address still changes. So you need some additional, like, tooling around your, like, straightforward applications, like, example, databases to make sure that when the, you know, bot comes back up, it, again, correctly connected back to the cluster, like, whether it's a it's still a primary or it's a, you know, okay, follow our secondary machine. Those are the 1 of the things.
And then the other thing is if you are really using a stateful, application for production use case, you want to make sure those are properly, like, you know, backed up in a periodic schedule. If the database supports, like, streaming backup, like, you know, in case where the post case, you can back up WAV files, things like that. Those are properly done, and then you have a way to recover those, in case of a, you know, disaster scenario. You can recover back from those backups. So those also are not really directly supported by the upstream project. AppSum project provides us the, you know, sort of the constructs and the, and you know, the basic, feature necessary to implement those things, but those has to be done by who is running in a stateful application on top of Kubernetes.
So those are the challenges that, you know, people usually like, developers using Kubernetes, running stateful applications on Kubernetes face. And QTP is really addresses that aspect of, running production workloads, production stateful databases on Kubernetes.
[00:10:44] Unknown:
And you mentioned that 1 of the reasons for wanting to be able to run the database on Kubernetes as opposed to just running it as a dedicated instance or via some managed hosting is for having it use the same sort of life cycles as your development and same sort of deployment tooling. But I'm wondering if there are any benefits to having it just be a hosted instance and just have it live outside of the Kubernetes environment and be able to just manage the application as a separate entity and any of the, real benefits of having the database as part of the same Kubernetes platform?
[00:11:23] Unknown:
So 1 of the things that you just mentioned, like, obviously, the benefit is the, you know, so same stack, which comes with all your, like, monitoring, all your alert management. Like, if you're using Prometheus for your monitoring, those can all you can have it, like, common view or 1 single, you know, call the single pane of glass view of your, entire production infrastructure. So that is definitely a big benefit. And then, it depends on what where you are running your workloads. Like, if you are on prem, like, you are running, like, you're on a bare metal plus or maybe with OpenStack, and you are using VM, what it is to run your workloads, there you do not usually have the benefit of a hosted service.
I mean, you if you are on, like, a, you know, power provider, then, yes, you do you do have that option of using 1 of the hosted services. So we see a lot of users who are, like, on, various season on on on bare metal or, like, on on prem Kubernetes clusters. They they need something like udv to sort of operate and manage their databases. So that is 1 use case. The other benefit with running, like, databases on Kubernetes, is that some of these cloud vendors, if you'll actually look at their managed offering, their managed database offering usually don't have the latest version. They sometimes they are quite quite a bit behind from whatever, you know, like was the latest version of the database. And if you have an application use case where you actually want to use the latest version, then, you know, if you are running on Kubernetes, you know, you can initially get the latest Docker image. With QTV, you are able to do that. So that is another benefit that we see. And then the other benefit is, you know, like, if you look at the pricing model on the cloud providers, that is what I think I believe 1 of the, like, we recently did a survey, of our QD users, like, a couple of months ago in February.
And 1 of the things that we saw why people are using, like, even when they are on public cloud, running the database on Kubernetes is because it is much cheaper. Like, if you are on, let's say, AWS and using their managed, Elasticsearch solution and you say you are running a 3 node Elasticsearch cluster with the m 4 large or x large, we did some, like, sort of calculation recently. Like, they they are charging an, over, like, you know, if you're on a purely on demand instance for a 3 node cluster, they are charging an extra $25100 for power 3 node Kubernetes, Elasticsearch cluster. So if you are on Kubernetes, the benefit is that you only pay for the VM and the disk space, but you are not really paying anything extra, not at least in monetary terms for the managed layer. If you think in terms of the managed service or in the cloud providers offer, they are not really being your DBA. Right? All all they are making sure that your database is running, it is, properly backed up, and if you need to recover or restore, there is a way to do it. And all those things can be done using this operator pattern on Kubernetes. That's what we are doing with Qubedv.
And, you know, if you imagine in a world where, you know, our product, like, QubedV is, you know, still quite a bit, you know, hardened at this point after the last 2 years of development. But, you know, you can imagine that at a point there is you can get the same, you know, quality of service when you are running the databases on Kubernetes, and you don't have to pay for that extra money. So so a lot of the people who are cost conscious, we do see them using Kubernetes, like, running data on Kubernetes using QMDB. In fact, the survey that we ran, 1 of the interesting sort of stats was that, it's not, like so, you know, peep we saw the people using multiple ways of running databases. So, like, we saw the 31% people say they are, using cloud providers managed solution, And then 43% say that they are doing, like, a self hosting, essentially running it outside Kubernetes, in some fashion. And then there was a 45% people who say that they are using Helm charts, just directly Helm charts to running databases before they were using Kubernetes. Those are those numbers. So, obviously, this is more than 100%. So people, obviously, people are using multiple ways of running databases, as, you know, they figure out how their Kubernetes transition happens.
[00:15:48] Unknown:
And so can you describe how Kube DB itself is implemented and how it has evolved since you first began working on it? Yeah. So Kubernetes,
[00:15:58] Unknown:
you know, has evolved a lot and and at the same way, QDB has evolved alongside. So Kubernetes, QubedDB right now is implemented as a collection of, collection of CRDs, which is called the custom resource definitions. So and a a an operator for each database. So the, if you're familiar with the way Kubernetes works is that, you have the, you know, you have the Kubernetes API server, and there you can define or register custom resource types. It's kind of the same resource type if you think in terms of like a REST API server. You have a resource you can define. And those resources, for QTB, we have defined this number of resources, each 1 for each type of database, like, Postgres, Mongo, MySQL, all of that. And then, what happens is that we have written this operator.
So 1 of the interesting thing about Kubernetes is that, you it'll it'll kind of give you, like, a push notification kind of concept. It's it's called watch in terms of the Kubernetes lingo, but it's like a push notification you get, like, on your mobile phone if something on that resource changes. And and then you can write a reactive code. It's like a control loop that will take actions as needed. So the way it works is that, you know, user goes and creates, so deploys this operator in the cluster as a deployment, and then they can say, okay. I want a post case database with clustering, maybe 1 master and 2 replica, and this version of the database and this much CPU memory for each node, etcetera, then they'll create that resource definition.
And once they clear the definition, our operator will get it, push get a notification, like, a watch notification that a new resource has been created. And then it finds out that that, the necessary things that it needs to do, like, essentially, it needs to start a port, it needs to create the issue, those storage space needed for those database are not there currently, so it will actually create those resources as another resource in the Kubernetes, context. And then the those controller for those or the partner for those resources is sort of push on the database. And then once all of those aspects are ready, like the database is ready for the user, we'll get back to the user saying that, hey. Your resource that you created now is the status ready.
So you can start using that with your application. So that's sort of the general, model that we follow. And right now, we have a separate operator for each database that we support. So we support, Postgres, Elasticsearch, MongoDB, MySQL, Rediche, and, Memcached. And so those are the sort of the common databases that we see a lot of requests for, and there is an app for each 1 of them. But then for the user, we kind of combine them all into all into a single, docker image, like a single go binary and then as a docker image and deploy that on what it is. And how
[00:19:00] Unknown:
has the development cycle of Kubernetes and the overall velocity of the project impacted your ability to be able to build and support cube DB? And I'm also curious about how some of the relatively recent features such as stateful sets factor into the ways that you're able to leverage existing primitives in Kubernetes rather than having to implement them on your own? Yeah. So
[00:19:24] Unknown:
when we actually started the project, like, back in 2016, like, early 2016, we're internally we're actually they didn't even using any of this concept. There is no concept of, CRD. There is to be something called, like, a TPR or third party resource. So when we originally started the project, we were essentially building our, like, own regular just standard HTTP server. And if you make a request to that, it'll run the Kubernetes port as a deployment. And that was like our really, like, version 0.1.0. And then, we became familiar with the concept of, TPR, then we sort of switched to that.
Even then around, I believe, around 1.5 or 1.6, Kubernetes 1.5 or Kubernetes. 1.6 release, they release this concept of pet set first, and then they renamed that to stateful set. I believe it is on 1.6. And at that point, we switched over from deployment to stateful set. So the this was a pretty important development for running stateful applications on Kubernetes because stateful set does a couple of important things. 1 is that you can essentially give it a static host name, which is useful in certain context. And the other issue is that, with this stateful set, it guarantees that the number of replicas you want, the data the the Qantas will never create more than those number of replicas.
So for example, like, if you are doing a deployment and let's say you say, I have, you know, 3 replicas and you decide to change the docker image tag, it will actually create a new instance. You you created, like, a new replica with the new image tag. So for momentarily, your number of replicas will go up to 4, or you can actually control how fast like, how quickly it switches over. So but at least at least it'll go to 4 at first, and then it'll kind of kill 1 of the old, ports and then initially kind of do this rolling updates. But if you are running I think if you are running, like, a database, usually don't want that kind of, like, a you know, in case in the number of replica size because, you know, you want your replica to be either, like, a if you're especially in a clustered scenario, like, 1 or 3 or 5, not like a 4, even for momentarily scenario. So though there are those sort of challenges which are actually addressed by a stateful set, especially if you are, like, you know, switching over or, like, an updating image tags. So though that was pretty important.
And then the other important development that happened in the Kubernetes space was the you see, the TPR concept that was originally introduced was kind of, you know, really like an alpha feature. And since then, like, there has been a lot of interest in the Kubernetes community building this kind of applications for Kubernetes. And as a result, the upstream project introduced, like, a new version of this, which is called the CRD, the custom resource definition, and then that has actually, like, evolved a lot and has been improving with every release.
I mean, I would say, like, just up to the last December when the Kubernetes 1.13 was released, they introduced you some concept called, like, a, you know, version conversion in in in your CRD. So so if you think of this way, like you say, we talked about this thing, like, we have this, you know, YAML version where you decide how you want to specify that desired database configuration. But let's say at some point, we come up we find out that, okay, we need to change the format of that, you know, that YAML object. And, before, like, before 1.13, there was no sort of way to do that migration automatically.
Like, you all you really had to do was kind of, like, you shut down your apart or do the migration manually and then kind kind of bring it back. And if you think in terms of, like, a production quality system, that is not really a great, scenario, or user experience. So those things have been addressed now. So all those improvements in the CRD by the upstream project has been really, you know, essential to getting Qubedb to a point where, you know, we are very comfortable with saying that it is production ready if you if people are willing to use it for production quality databases.
[00:23:47] Unknown:
And so you've described how cube DB enables you to be able to deploy these different database engines and get them configured. And I'm wondering what are some of the other aspects of the overall life cycle of the database that it simplifies and sort of takes out of having to be something that it gets set up upfront that, is just so somewhat automated by using cube DB versus doing it manually?
[00:24:13] Unknown:
Sure. So, I mean, 1 of the first things that we have tried to address was that, just running a database, bringing up a database with as few, bits of, like, information from the user. So what all we really want from the user to tell us that, okay, I want to run the post based database or elastic search database, maybe this version, and this many replicas or maybe, like, 1 master to replica, whatever the it depends on the database type. So the first issue, you know, the pinpoint that we address is given just that little bit of information, we can create all the rest, which is, like, all like, getting the setful set, getting, like, a Kubernetes service, creating, like, the service account, if you have a security policy enabled, like, addressing those things. If you think in terms of, like, doing those manually, those those are, like, a quite a bit of knowledge that you need to know just about Kubernetes and also exactly how that applies to that particular database system. So those those are the first pieces of challenge or sort of issues that we handle as an operator. And another thing that I mentioned before, like, if you are doing a clustering, so you cannot just depending on the database system, you cannot just use the official Docker images. You will need some additional help to make sure when the, you know, the port fails over, the IP changes, those things are properly addressed or, like, you know, using some sort of leader election algorithm that is separated in quantities or some other mechanism. Like, it could be a combination of the support, like a DNS, like a survey records. And based on the databases support, it is done. So those are the first things, that we handle.
And then the other thing is the, automated backup and recovery. So you can, you know, you can do periodic backups. Like, you can give us a scroll expression, and then you can say, this is my s 3 bucket or GCS, whole power storage bucket or Azure bucket, and just take backups, every 4 hours, 6 hours, you can decide. And then if the database actually supports like the automated streaming backups, setting those up, like for example, in case of post case, you can do, like, a, you know, wall file migration. So you'll continuously ship your wall files to a remote bucket. So those kind of things we set up and then, obviously, how do you recover it? So that's the other aspect of it. And the recover could be just, okay, bring back the old version or it could be, okay, actually clone the database. The old 1 that is running is actually running. That's fine. But, like, Google defined namespace or maybe even Google defined Kubernetes cluster and actually recover from that backup so I can get another copy and actually do, you know, some sort of maybe testing or analysis or whatnot if I want to do in terms of, like, you know, have a recovery at some point in time. Those kind of things we support. And then the other thing that we recently introduced was user management support. So if you're let's say you are using Postgres and you have many different databases, or Postgres, Mongo, Harish, and so the Postgres, Mongo, and MySQL.
So if you have wanted to user management, so how do you do user management? So we actually introduced, we integrated this with Vault, the HashiCorp's Vault project. So you can use Vault project to dynamically issue, you know, like a new user for your Postgres or your MySQL or, MongoDB cluster. And those users, then you can also initially decide how much permission they have. Like, you can say, oh, only grant permission on this table or that table or that database, depending on the type of database. And then what you can do is, actually, the credential that gets issued, you can sort of inject that directly into some other application. Let's say you are running an API server that writes data to that database.
So that can be automatically injected and, and the CSI driver so the whole life cycle gets managed. So it'll actually do the kind of things that, like, you know, in a highly compliant environment requires. Like, okay. My database ticket gets really at the, I don't know, 24 hours or something like those kind of things will be possible. So that is actually already we released it, like, couple of months ago. So that's why, actually, we are calling the project as Q Vault, but kind of because that that could be, you know, work with QTB or it could be just, you know, like, you can essentially use that to manage just any general secret you have, like database secrets, your cloud provider secrets, just a secret store. So that that is another sort of, like, a companion project, we introduced.
And then 1 of the thing that we have been doing recently, is the the the secret management aspect sorry, the backup and recovery that we are talking about, we are actually kind of we're currently going through a redesign of that aspect, which is kind of you know, we've been getting a lot of requests for people to sort of actually give us, like, a backup that is and encrypted, like, address and all those kind of things and kind of reworking some of those aspects. So those those are the, you know, sites of a database life cycle management,
[00:29:27] Unknown:
that we provide today as part of Kube DB. And forgive me if you covered this already, but does Kube DB also handle any of the upgrade aspects of deploying new versions of the database? Because I know that for things like Postgres, it can have some tricky pieces that you need to be able to do of deploying the new version, and then migrating the format of the storage to work with the new instance. So it requires having an old version of the database to be able to run the upgrade, or for elastic search, where depending on the versions that you're going between, you can either do a rolling upgrade, or you might need to do a complete reindex of the data. So I'm curious how much of that overall life cycle management Qube DB builds in. Yeah. So this is actually 1 of the, you know, I would say, 1 of the most challenging aspect of, just managing database in general. So,
[00:30:20] Unknown:
we are not looking to support, like, a major version updates, automatic major version updates. I I don't think that that's really realistic because depending on major version updates, even you have to rewrite some parts of your application. Right? If you especially, if you are familiar with Elasticsearch, they for every time they update their major version, they actually, like, change stuff, like how their APIs works and all that. So that's not something we don't think they will be able to support automatically. But, for minor version and the patch version updates, it is currently not automatically supported, but we have the so the you can say the, you know, the compose in place that you can use to manually do those changes. And, we are we are, you know, we're part of, looking forward to adding those automated failover options in future releases.
So changing for the patch is fairly straightforward, but, like, you can think of some scenarios where it kind of gets a little bit, like, tricky. For example, let's say you are running something like a reddish data, like a 3 node reddish cluster. So for each of the shards, 1 of 1 of them is sort of the primary. Right? So when you failover, what the way you know, what you really want to do is you want to make sure the the the secondaries get updated updated first and then the primary gets updated at the end because that way, you only need 1 failover. But if you just kind of, you know, do any random order, then if you kind of do the upgrade the primary first, then you have 1 failover. And then the the 1 that becomes the new primary, if you upgrade that, then that gets another failover. And then the the last 1, you know, third 1 gets again upgraded and that gets another failover. So you can essentially you know, worst case scenario, you end up with, like, a 3 failovers in a failover. So those like, doing those aspects of the, like, upgrading process intelligently, something we can do, like, something we are looking forward to doing. But right now, we left it to that user, like, whoever is managing the database to manually making sure that they are doing those things in correct sequence. So what we do support is that doing, like, a rolling upgrade. So if you do a rolling upgrade, it'll not do this intelligent stuff saying that, okay, I should actually really do the primary at the end, but I will just go over like, if you have 3 ports like port 0, port 1, port 2, it will first change port 2, then do port 1, then port 0. It doesn't really have any knowledge what is the correct way what are to do these things. So those are not currently supported, like the
[00:32:55] Unknown:
intelligent upgrade process. But, we're looking to do that. And as far as being able to add support for a new database engine, I'm wondering if you can talk through what's involved in getting that set up and incorporated into Kube DB and if there are any differences when you're working with systems that natively support clustering and therefore have an additional networking requirement such as what you've done with Elasticsearch and MongoDB?
[00:33:21] Unknown:
Yeah. So the general process is fairly similar to what you would do if you're writing, like, a Neo CRD and a controller for Kubernetes. So we have 1 project called the like, so all our project is hosted on GitHub. So it's github.com/qdb. So we have a sub project on API machinery where we define all the API objects, which is the CRDs and the and then the original clients. So you can the the processor will have to go through is essentially create, new, API types, the CRDs, and, and then run the generators to actually generate the clients and some there are some additional files that it needs to be generated. And then for the database that you want to add support for, at this point, what will be easier is to essentially just take 1 of the existing operators for an existing database and kind of just clone that and kind of doing changes that you want to see for the new database version, for the new database type.
So that that is and, and, you know, you kind of go through whatever changes you need to do. So, usually, around some of the stateful set, some of the, like, services, like the work ports it uses, all that. And then, then the 1 of the important thing that you just mentioned, like like, what kind of clustering support the database itself comes with. So if the database needs some support from Kubernetes to do some of those clustering, aspects, like using some kind of leader election or maybe some other commodities API to sort of detect its peers. Like, it's they if you're using some sort of, like, a gossip algorithm, we need to find all the peers under the same sort of quantity service. So those will need some changes in how the cluster, like, rules or the are given because it needs permission to kind of find out those, information from 1 of the API server. And if if we actually have to add those custom components, so the way we usually do is take the Docker in at least what we have done so far is usually take the Docker image that is available usually on Docker Hub. I think you can get, like, a official image or, you know, some some some other image and, add those components as part of the so and Docker image and actually host the Docker files on the project. And, once you have that, you kind of have a, you know, more or less, say, working at least something that will spin up a database of that type and and, clustering.
So those are the first aspects, and then then obviously, you have to go through how all the, you know, the the backup recovery works. And then if you have any special, like, kind of that kind of streaming updates kind of features. So so the benefit is, right, today is that because we have done all these, like, 6 different databases, we have enough sort of example code that you can look at and kind of essentially take from there and you reuse it. And then some of the aspects is already sort of, you know, kind of flushed out as a separate library, so you can use those libraries.
So that's that's sort of how it should go. So I would say, like, if you want to add a new database, you can probably get something working with a couple of weeks of work, assuming you are already familiar with just doing, like, a regular apart, like, CRD and Kubernetes apart or development.
[00:36:34] Unknown:
And I know that there are some other projects that are available such as storage OS that provide some additional facility around statefulness and maintaining storage on Kubernetes clusters. So I'm wondering if you can talk about some of the other projects that are in a similar space and how kube d b compares to what they're offering. So I
[00:36:58] Unknown:
storage OS website recently, and since we last talked. I think the way I look at storage OS is more at a level where if you are, like, on AWS, they have this service code. They have the CBS solution. Right? Like, the Elastic Block Storage. So I look at, what I understood, at least looking at their website, was the storage OS is more at that level. Like, if you are familiar with something called Rook, which is for running, like, a safe storage cluster on Kubernetes or clustered FS, or I think there's something called Open EBS, those are all sort of in the same class where what they do is like if you are running on bare metal clusters or I I have seen that even people who are running on on the Cloud, they can get a like a network storage using this kind of solutions. So you've got a network file system and then it kind of gives all the Kubernetes primitives, which is like a persistent volume, volume, storage class, persistent volume claim, maybe some sort of CSI driver, those concepts. But they are not, like, really, looking at the application layer. So the way it could work with Qdb is that when you are using Qdb and you like, when you spin up, like, a Postgres database and the each of those, like, you know, the ports need their own storage solution, 1 1 of the things you can do is get a disk from AWS or if you're on Google Cloud from Google Cloud or Azure, whatever the top whatever you are on. Or you go if you are using something like in storage ways or any of the other sort of similar persistent volume, persistent storage solutions for Kubernetes, you can use you can get a disk from those solutions and then use that as the storage volume and mount it.
So so if you especially if you are on, on prem, until very recently, that was probably your best bet. But usually, these solutions also, like, because they are network storage, they are by design somewhat slow. So if you are really, like, looking for really high performance and and you are on a, like, a bare metal bus tower with, you know, SSD disk, can it VME drivers and all that? Then, like, the the recent version of Kubernetes, introduced, like, GA, the support for local volume. So we can get really, like, a high performance local disk. It has been just released, like, end of the last month in March. So and if you you can use that also with Kubernetes. So or with QTV.
So QTV is kind of agnostic in terms of how the how you provide the storage as long as it is coming through as 1 of those, like, persistent volume or persistent volume claim concept that's present in quantities. So that is how I would compare this kind of, like, persistent storage solutions to Qubbb. So they are kind of like a, you know, 1 level below and can work with Qubbb based on which environment you are in. So though those I would say that's how this relates. And then if you look at, and if you're thinking in terms of, like, other sort of database or part or solution for kubed or for Kubernetes, so there are a number of open source projects out there if you, you know, search on GitHub. I think, we are more and more, we are seeing this on the database providers themselves are providing some kind of operator based solution for running databases on Kubernetes. Sometimes those are not open source. It's kind of some they're like a closed off source offering. And then the other thing you would see that there are like a lot of, projects that are built by, you know, other folks who are using Kubernetes and they kind of can come to the same issue that, okay, I need to run database, but it's not really there or, like, there are, like, various challenges. How do I solve it? So they can writing their own a part of it. But I would say that if you think in terms of, like, just doing that as sort of the primary focus of the company or as a product, I don't think that there is anybody else right now who is trying to make that as a product that, like, running database on Kubernetes as a product.
So I would say that in that, aspect, we are fairly unique at this time. And as far as your experience
[00:41:01] Unknown:
building and maintaining this cube DB project, what have you found to be some of the most interesting or challenging or unexpected lessons that you've encountered? Well, the biggest challenge for us, or I think anybody who's using Kubernetes today, is that the rate of
[00:41:16] Unknown:
change and it is kind of interesting is that at the same time, it is very fast, and at the same time, it is not fast enough. So if I can explain this, like, you know, if you're looking at Kubernetes, they have a major release every 3 months. And when the release happens, it gets a bunch of new features. Like, you know, when we started, there was no concept of, like I said, no CRD. There was just TPR. And the CRDs came, but when the TPR to CRD changed, there was no, like, a live migration path. It was like stop the world, do the migration, start over. Right? So that was, like, a big challenge if you're running databases. Because as a user, you don't care. Right? Like, why like, Kubernetes has this limitation today. Like, you know, that had had the limitation, like, couple of years ago. So that those are the challenges. Right? And then, like, came this authorized, like, RBAC support and then they introduced something called, like, a port security policy and all those things. As those improved, just keeping up with those challenges has been, like, a 1 of the interesting, you know, 1 of the interesting challenges for us. I mean, and anybody who's actually, like, tries to write it, Kubernetes or pirate. So that is 1 thing. And the other thing is, like I said, that it it the changes doesn't happen fast enough. Like, if you are looking for some particular feature and want the way kind of the works in the upstream, which does make sense from the us upstream perspective, but it kind of can take a long time because the way it kind of goes that if you want to introduce a new feature, it first goes through this phase of, like, an alpha implementation.
So then it goes to a beta implementation, and and then it gives JS. So even think, like, if it may happen the fastest fastest spot, it still takes 3 releases, which is, like, 9 months. So if you think if you're waiting for some particular feature that, okay, you know, this coming, but you want it or you need it, then you actually still have to wait, like, you know, 6 to 9 months. I mean, in the Kubernetes world, you do not want to use any alpha feature because when what happens with alpha feature is that there is no guarantee that it will actually go to beta because the and the way alpha features are introduced is that those are by default turned off. So you have to go and change various knobs to turn it on. And if you are, like, want you are building some product and you want that to work on any of the Kubernetes clusters out there in the wild, especially the managed services like GKE or EKS or things like that or EKS. They do not turn on the alpha features. And especially if you you are, like, on GKE, they actually, if you turn on, like, any of the alpha features, they'll delete the cluster after the month. I mean, that's they kind of give you a big warning that this is not what production uses. So that's, like, 1 of the interesting challenges. Like, okay, I need this feature. It's coming, but it's alpha. So until it goes beta, I cannot really use it. And it sometimes seems going from alpha to beta can take multiple releases depending on how complex or how much other coordination or changes necessary. So that has been 1 of the challenges that, okay, yeah, it's coming it's happening very fast, but it's also not fast enough in some respect. And then the other challenge, you know, just doing the application development in terms of Kubernetes is that, like, Kubernetes upstream projects has a, at this point, fairly well and good documentation on how to use Kubernetes as an user or another developer. But if you think in terms of where we stand, we are actually building applications for Kubernetes. At this level, the documentation quality is not really great. I mean, they have some sample projects on the upstream and then also these things are changing quite a bit in recent terms. So if you are trying to build an application like you are trying to write a new operator for Kubernetes for whatever purpose, like, learning how to do that, like, the documentation wasn't really great and it's still not very great. There are various things that are not very well documented. So you kind of actually have to go and read some of the source code and see how those are done for some of the upstream stuff and maybe, you know, done by the same developers for for whatever vendor they work for maybe in that project. So kind of have to go and read docs, like, the actual source code to figure out, like, how this stuff and but people are very helpful in general. So if you go to theirs, like, the upstream command in Slack and, you know, ask questions, they're very helpful. So they'll kind of tell you, but but you have to spend the time to learn those things. That has been that that is a, like, a big investment in time and, you know, resource you have to do. So those have been biggest challenges and and, you know, just keeping up with these changes. And what do you have planned for the future of QubedDB?
Yeah. So what we are trying to do, you know, in the near future, we had some clustering support. We are adding clustering support for MySQL and MongoDB. We're hoping to get those out fairly quickly, like, in few months, hopefully this month. And then we are kind of essentially going through our, like so I think we mentioned that 1 thing that does updating the backup recovery support to have, like, a encrypted to do 10, like, all kind of how the retention policy applies. Those are some of the keys aspect. So we look at those as some of the key things for the project to reach 1.0. So those are the key things we are looking at at least for the next for the next quarter. So and then, hopefully, by summer, we can get to a fairly stable position where we kind of think all the things we think necessary to hit 1.0 are there. And then on the other side, we are also working on, some, like, a web console. So right now, all the things that you can do with qdv, it is primarily through CLI. Like, they they have the cube kernel command line interface, and we also have a, like, a command line tooling, which can be used as a plug in for cubecuddle, but, but it it kind of also works as separate, just a standard by along CLI tool. So we're actually building, like, a web, interface. So if you think in terms of, like, how AWS RDS works, like, you know, you don't have to know all those little details of our YAMLs.
You have a form. You put in some of the fields. The other ones have sort of prefilled for you, and then you get a database running. And I think that's the that is the kind of experience we think that QD users should have. So we are kind of working towards that. That is something, you know, working on the side. Hopefully, we can get that out quickly too. And then at that point, we'll probably explore adding more database support. But until then, I don't think we're gonna add any additional database support unless somebody comes along. Because it's a open source project, right, on under Apache v 2 license. So if somebody wants to come along and just contribute something, then, you know, we are happy to look at that. But for our team, this is sort of what we're looking at, at least for the next, you know, 3 to 6 months. And are there any other aspects
[00:47:32] Unknown:
of cube DB
[00:47:33] Unknown:
or managing databases on Kubernetes clusters or any of the other work that you're doing at Appscode that you think we should discuss before we close out the show? No. Oh, I think I have kind of covered all the areas. I mean, so the Qdv I mean, 1 of the interesting thing we actually have been doing recently, is called a service broker integration. So if you're familiar with, like, a Cloud Foundry, they has this API called, like, open service broker. Kubernetes has a support for it via their service catalog project. So if you are on Cloud Foundry and want to run your databases on, like, Kubernetes, and want to integrate via the service broker interface. We have a project, that can do that for you. So it it it works fairly well with like, if you are on Kubernetes, if for service catalog, it's also on Kubernetes. And then recently, we have been working with SUSE to they have been actually helping us to kind of integrate that into Cloud Foundry and their, Cloud Foundry service. So those are some of the testing as, you know, development that's going around and kind of this, you know and that is actually open source. If you go to github.com/appscode/servicedashproper, you can find that project, and, it kinda works there. Yeah. So I think that's, those those kind of, interesting stuff that's happening around QDV.
[00:48:45] Unknown:
That's good. And for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. I think the biggest
[00:49:03] Unknown:
gap in the data management tooling is, you know, like, if you are on cloud provider, usually, you have a fairly, good experience because you may may be able to just use the managed services that the cloud provider offers as, you mean, you are, you know, willing to take on all the, you know, the cost and the all the, like, some amount of better lock in that comes along with that. But if you are looking at, you know, staying some sort of, you know, like, to have a hybrid cloud situation or do you have something not on the cloud, there there are obviously tooling gap QTP tries to fill in. And, the other issue I would say that, you know, if you want to stay sort of vendor, like, neutral, not necessarily meaning that don't use any of the features that vendor provides, but also have that opportunity that you can move from 1 window to another or maybe work across different vendors. There are, you know, limitations in what you can do because the cloud provider is really usually focused on getting things working on their particular cloud, and that makes sense. So in that kind of scenario, using something like QDV or the, you know, the Vault and the Stash project of ours can help you to manage your data,
[00:50:07] Unknown:
not just the stateful applications, but, you know, just any kind of stateful data that you have managed, you can manage that. Well, thank you very much for taking the time today to join me and discuss the work that you're doing with Kube DB and trying to simplify the work of running databases on Kubernetes clusters. It's definitely something that I that I've heard a lot of people talking about and something that is not easy to do out of the box. So I appreciate your efforts on that front, and I hope you enjoy the rest of your day. Yeah. Thank you. And giving me the opportunity to talk to you and to your users, to your audience
[00:50:38] Unknown:
about QTP. And, you know, if anybody have any questions or anything, you can obviously get back to me via the contact information. Thank you. Have a good day.
Introduction to Tamal Sahar and CubeDB
Challenges of Running Stateful Systems on Kubernetes
Benefits of Running Databases on Kubernetes
Implementation and Evolution of CubeDB
Lifecycle Management and Automation in CubeDB
Handling Database Upgrades with CubeDB
Adding Support for New Database Engines
Comparison with Other Projects and Solutions
Challenges and Lessons Learned
Future Plans for CubeDB
Final Thoughts and Contact Information