Summary
The database market has seen unprecedented activity in recent years, with new options addressing a variety of needs being introduced on a nearly constant basis. Despite that, there are a handful of databases that continue to be adopted due to their proven reliability and robust features. MariaDB is one of those default options that has continued to grow and innovate while offering a familiar and stable experience. In this episode field CTO Manjot Singh shares his experiences as an early user of MySQL and MariaDB and explains how the suite of products being built on top of the open source foundation address the growing needs for advanced storage and analytical capabilities.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- You wake up to a Slack message from your CEO, who’s upset because the company’s revenue dashboard is broken. You’re told to fix it before this morning’s board meeting, which is just minutes away. Enter Metaplane, the industry’s only self-serve data observability tool. In just a few clicks, you identify the issue’s root cause, conduct an impact analysis—and save the day. Data leaders at Imperfect Foods, Drift, and Vendr love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free-forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced features with a 14-day free trial. Mention the podcast to get a free "In Data We Trust World Tour" t-shirt.
- RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder.
- Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer.
- Your host is Tobias Macey and today I’m interviewing Manjot Singh about MariaDB, one of the leading open source database engines
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what MariaDB is and the story behind it?
- MariaDB started as a fork of the MySQL engine, what are the notable differences that have evolved between the two projects?
- How have the MariaDB team worked to maintain compatibility for users who want to switch from MySQL?
- What are the unique capabilities that MariaDB offers?
- Beyond the core open source project you have built a suite of commercial extensions. What are the use cases/capabilities that you are targeting with those products?
- How do you balance the time and effort invested in the open source engine against the commercial projects to ensure that the overall effort is sustainable?
- What are your guidelines for what features and capabilities are released in the community edition and which are more suited to the commercial products?
- For your managed cloud service, what are the differentiating factors for that versus the database services provided by the major cloud platforms?
- What do you see as the future of the database market and how we interact and integrate with them?
- What are the most interesting, innovative, or unexpected ways that you have seen MariaDB used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on MariaDB?
- When is MariaDB the wrong choice?
- What do you have planned for the future of MariaDB?
Contact Info
- @ManjotSingh on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
- MariaDB
- HTML Goodies
- MySQL
- PHP
- MySQL/MariaDB Pluggable Storage
- InnoDB
- MyISAM
- Aria Storage
- SQL/PSM
- MyRocks
- MariaDB XPand
- BSL == Business Source License
- Paxos
- MariaDB MongoDB Compatibility
- Vertica
- MariaDB Spider Storage Engine
- IHME == Institute for Health Metrics and Evaluation
- Rundeck
- MaxScale
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With their new managed database service, you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show.
You wake up to a Slack message from your CEO who's upset because the company's revenue dashboard is broken. You're told to fix it before this morning's board meeting, which is just minutes away. Enter Metaplane, the industry's only self serve data observability tool. In just a few clicks, you identify the issue's root cause, conduct an impact analysis, and save the day. Data leaders at Imperfect Foods, Drift, and Vendor love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced features with a 14 day free trial. And if you mentioned the podcast, you get a free in data we trust world tour t shirt. Your host is Tobias Macy. And today, I'm interviewing Manjot Singh about MariaDB, 1 of the leading open source database engines and the suite of commercial offerings that are available to build on top of that. So, Manjot, can you start by introducing yourself?
[00:01:52] Unknown:
My name is Manjot Singh. I am field CTO at MariaDB. I also lead our customer engineering department, which sometimes I'll call them ninjas and rock stars, but they all hate those words, so I just call them our top level experts. I'm really privileged to work with these really cool people.
[00:02:10] Unknown:
Do you remember how you first got started working in data?
[00:02:13] Unknown:
Yeah. That's actually a really interesting story. I got into making websites, I think I was 13, and I learned off HTML goodies, and I thought, you know, wow, I'm really smart. Someone called me a script kitty. I think it was the late nineties. And from there, I I wanted to figure out how to direct that. And so I started a hosting company with a friend in high school. He had Comcast, and we put a put a Linux box in his living room, and we tried to host some websites. As you can tell, young kids, we weren't great at marketing. But I did start working on Pearl websites for my local Sikh temple, and they they just needed me to host some media. I used flat files, and I realized around, I wanna say around 99, 2000, that files aren't fun. Making websites in notepad isn't great.
And there was this cool new thing called MySQL. And I think it was MySQL 3.23, which was probably the oldest version most people have run-in production. I set that up with this cool new language called PHP, and I started just programming websites, making things for friends and people that would pay me, like, $20. And I went from there to creating a career, I think, in IT where I started as a sysadmin. And I accidentally fell back into actually using MySQL as a professional. I got solicited by a headhunter, and the headhunter goes, have you ever used MySQL? And I was like, oh, yeah.
Tons of times. I made all these websites. I'm still maintaining a lot of them. And she's like, great. You wanna do that as your professional career? And so I called 1 of my friends that worked at Sun, and they had just purchased my sequel. He goes, oh, here's all of the test material. Just read this and you'll do fine. So I go to the interview, I'm sitting in front of this panel. And at the end of the panel, the guy's like, you're the only 1 here that's actually installed MySQL, so we're probably gonna move forward with you. They were interviewing Oracle DBAs. And from there, I found definitely interesting that you've got such a long history with the, kind of,
[00:04:36] Unknown:
foundational technology and that you've now landed with the MariaDB project since this is an outgrowth of that MySQL heritage. And so wondering if you can give a bit of an overview about, for people who aren't familiar, what MariaDB is and maybe some of the story behind how it came to be its own entity versus the we don't need to dig into any sort of, like, the politics aspect of it, but, you know, just some of the history of how we came to be where we had now have MariaDB and MySQL where they used to be the same thing.
[00:05:04] Unknown:
Yeah. And, actually, I learned some of this over a dinner with Monty a few years ago, and I'm sitting next to others that had come up with him. They're like, yeah. We heard of this cool thing. I made the PHP connector, and I did this. And I was like, this was the history of the Internet. Like, I'm like, you guys got together 1 night in college, and just created everything the Internet was built on. I was kinda starstruck, and it was pretty cool. I became a MySQL MariaDB fanboy when I started going to conferences with that first job, and I learned quite a bit. Monty made this product named after, I think, his first daughter, Maya, MySQL, or MySQL as they say in Europe, and that's still a debate in the company, by the way.
He created this cool database, and he had a lot of great ideas, 1 of them being the pluggable storage engine, where you could just trade out the back end that the data's being stored on and have a completely different use case. But the SQL syntax, the language, the handler, all of that would be the same. And I think that dream really led him to create this company and have success in selling it to 1 of the leading open source companies, Sun Microsystems. I think that when it sold to Oracle, it was unexpected by a lot of the team that was originally there. And I know this, like, 3rd and 4th hand. Right? I wasn't there.
And for them, they still wanted to create that passion, that storage engine passion, because they saw Oracle owned Innobase, which became Innodb, and they wanted to make that MySQL. Right? So it had the 1 storage engine. And on the other hand, we had so many other ideas. They wanted to compete with Oracle's enterprise database. They wanted to compete with other legacy RDBMSs, as I'll call them. Right? And I think that dream of having a really flexible open source database that doesn't necessarily have the chance to be closed source, like, for example, Oracle's enterprise MySQL is is closed source. Right? I think that brought them to that. And his second daughter, Maria, he already had a storage engine named after her, which was I call it my ISAM too, kind of. Right? It does replace a lot of my ISAM's use cases. He took that. He renamed that to ARIA, took the m off, and then named the whole database for MariaDB. And since then, we've really taken our own sort of view on it. We haven't been a fork in a long time.
For example, it's been our own code, which is open source, and we have a thriving
[00:07:42] Unknown:
community around it. Probably largely because of its association with Oracle and the direction that they were pushing MySQL, MariaDB ended up being the kind of broadly adopted community option for MySQL compatible workloads. And I know that MariaDB has focused heavily on maintaining that compatibility even as the 2 projects have diverged, and it has been a number of years now. I don't know the exact number, but probably at least on the order of 5 or 6. And I'm wondering how you have approached that challenge of being able to maintain compatibility between these 2 disparate projects that have, you know, widely diverging underlying implementation details now, but making it as smooth as possible for people to be able to migrate from a MySQL to a MariaDB without having to do a bunch of code changes accompanying it? That's a hard 1. So I think SQLPSM,
[00:08:38] Unknown:
which is the the syntax that's used in my SQL and and read DB and compatible is something that's important to a lot of us. We wanna maintain that compatibility. Now there are a lot of places where we necessarily aren't compatible, but in most cases, I like I would say 99.99% of cases, you can drop in MariaDB. It's important for us to make it easy for our community to use our products. Right? And I think we're gonna do more of that. And so you have smart engineers at Oracle. You have smart engineers here, and at all the other forks. And a lot of times we come to the same conclusions. You'll notice that MariaDB did make a lot of features first, and eventually, they went into the other forks or into MySQL, and vice versa, of course.
And you'll find that a lot of times, they're approaching similar problems and solving them the same ways. And a lot of that is customer driven. Right? If we have customers that say, well, we wanna do this, or we wanna migrate from we help them with that. Right? We make it as as easy for them as as possible, I think. And we do try to be syntactically compatible, just like we have with our Oracle layer, for example. You can actually use PLSQL in MariaDB and ANSI SQL.
[00:09:58] Unknown:
As far as the ways that that shapes the overarching project, I'm wondering if you see that kind of constraint of maintaining broad compatibility with MySQL as a benefit because it provides focus or as a, you know, set of shackles because you want to go in your own direction, but you can't justifiably do that because then you're going to be kind of ditching a whole bunch of customers who you would otherwise be able
[00:10:25] Unknown:
to support. Yeah. And I think that's a fine line, and I think we've walked that fine line pretty well. I think if we go too far 1 way or the other, the community tells us, and we do what's best for our our users and our customers. And I think we do it better than MySQL. And I'm not trying to put them down, but they have a very clear, we support this version, and we'll support replication from the last version and features from the last version. For us, we've actually struggled because we have so many versions in support most of the time, and I think that kinda led to some of our recent changes. But there was a time we had 5 GA versions just recently that were all in support, and we could replicate as far back as, like, 5.1.
I just helped a customer. They went from 5.2 to 10 dot 3, skipping all those versions, and we could replicate from any of them. So I feel like we've done a pretty good job of maintaining that replication, that compatibility, and I think we're unique in that aspect.
[00:11:32] Unknown:
Kind of projecting forward a little bit, I'm curious if you see anything that would potentially motivate you to say that, you know, we want to chart our own path. We have, you know, diverged far enough. It has been enough time between the initial fork to where we are now that we don't feel bound to MySQL as kind of defining our future trajectory. We want to go be our own thing. You know, if you want to migrate from MySQL, then you'll have to do it, you know, from this older version and then upgrade to where we are now and just kind of, like, cut that tie.
[00:12:06] Unknown:
I personally feel like we're already there. I mean, we've been there for some time. You know, we took our own path by adding the Oracle compatibility. We don't have some of the 5.7 features the way that MySQL has it. We did our own methods, right? For a lot of these features. There's differences in the way we do JSON and GIS. They're very minor and they're easy to work around, but I think our software architects would tell you that we did it better.
[00:12:35] Unknown:
And so in terms of the places that you have gone your own or added your own capabilities, what are some of those features that are unique to MariaDB and aren't available either in MySQL or even, you know, the broad majority of other either open source or commercial relational engines?
[00:12:53] Unknown:
If we just look at MySQL and its forks, storage engines. Right? We have a storage engine that is just like a read only engine, ARIA, but it is transactional. We're also fully ACID compliant, which MySQL wasn't until they removed my ISAM. We have column store. That's the big 1, I think. Being able to do analytics and join them to your OLTP tables easily within the same command line. And perhaps with my rocks, having a high write workload. I worked with a client that put in my rocks, and they were doing millions of rows an hour for probably even more than that. And they were struggling on InnoDB and other engines. We put in my rocks, and they were doing great. So there's a lot of features there, but I think there's also value in our other products such as Xpand. Right?
You put that together with our enterprise server, you have a command line compatible database. Again, SQL PSM, which is important to us. It's actually more similar to SQL PSM and my SQL than MariaDB in some cases, but it's distributed SQL. No more worrying about sharding, no more worrying about how do I add and remove nodes, create replicas. It just does it. And that's pretty cool. That's something that, as a DBA, I spend a lot of time on. Right? Let's create a replica. Let's copy the data over. Let's make sure all the IDs are correct. There's none of that with Xpand. And MaxScale actually brings that to enterprise server as well.
[00:14:22] Unknown:
Digging more into sort of the commercial offerings and what you see as being kind of commercial capabilities. I'm wondering if you can talk to some of the kind of broad use cases and features that you're aiming for with those commercial options and how you think about, you know, if and or when those capabilities, you know, might land into sort of the the community distribution.
[00:14:52] Unknown:
Yeah. So our features do go to enterprise first in in a lot of cases, but they'll make their way to community. So far, it's been, like, I'd say a year or 2 lag, not not anything crazy. I can't speak to the policy on that necessarily, but I'm a big open source MySQL MariaDB fanboy, and I think that's what, I guess, people like about me, but at the same time, we need to have something that is stable and secure and compliant for our customers, and I think that's really the differentiator. And eventually, all of that needs to be in community as well, just to make the world a better place, and I think a lot of it does. You know, even BSL, as much as some other companies, some of our competitors would like to put it down, it's somewhat fair. Right? We make our money off the code. It's it's like a patent. Right? We'll sell it, our customers benefit, our users benefit, and a few years later, it's GPL, and someone can innovate on that if they want to.
[00:15:54] Unknown:
So digging more into the kind of capabilities of MariaDB and particularly some of the enterprise oriented features you were talking about expand. I have definitely had many conversations with people who have banged their head against replication and sharding and trying to scale out their MySQL installations. I'm wondering if you can maybe start with some of the underlying architectural concepts that go into expand and making it able to kind of scale out horizontally without all of that additional headache and manual fine tuning.
[00:16:26] Unknown:
I come from being a DBA and consultant. Right? So not a c developer, so I'll speak to how it behaves. And I think expanding to Paxos based cluster, it has some features that are pretty unique. It's distributed SQL, it lets you do, I would say, a lot of parallel queries, And at most, you get, like, 2 hops. So it knows where the data is. It cuts it up into slices and it replicates it. So it copies these replicas, like, to each node, you can actually control that. So you could have no replicas or you could have every node hold all the data. But if you're someone like Samsung, for example, you might have 30, 40 nodes in a cluster, and you can actually locate the data in different places. I'm sure a lot of that is familiar with other distributed SQL products.
The slices are pretty cool. They hold little parts, I would say, of your table. And those sets of rows, it knows how to track and join them in a parallel fashion and take advantage of all of the hardware across your cluster. That's something unique that a lot of MySQL MariaDB users might find valuable. Right? And you can add another node, it'll automatically rebalance the clusters, so it has this feature called rebalancer. I think that's really cool as well. When I worked at HP Helion, for example, I helped put in 1 of the early versions of Elastic into the cloud. And this reminds me somewhat of that, the way that it slices things up. But I think it's very sophisticated and smart, and that's attractive to me.
[00:17:55] Unknown:
Another 1 of the offerings is SkySQL, which is serverless database as a service. And I know that that is definitely a direction that a lot of people are moving to with the growth of cloud, with a lot more kind of cloud native workloads where you don't want to have to think about where is the database, what is it running, how do I scale it. And so I'm curious if you can talk to what that offering looks like and some of the capabilities that that brings on top of the underlying MariaDB engine.
[00:18:25] Unknown:
Our goal with SkySQL is security first. Right? That's 1 of the our early PMs was, like, secure by default. And that's actually been our take with enterprise server and our other products. But SkySQL is really about enabling any workload at any scale, but you're not having to deploy all these random other products. Right? I have to go over here so I can have documents. I have to go over here for column store. I have to go over here for analytics, and I have to go over here for my OLTP. And with SkySQL, you can have a lot of that just in 1 command line. Right? And you can join across many of those features. So any workload that's whether it's transactional or analytical, and then any scale. So expand is actually really powerful for us there because of that elastic scale out and scale in. We have 1 client that every Black Friday. Right?
Early on, when before we had the ability to do this, we would just make their nodes bigger. Right? Black Friday's coming up. We need a 128 cores or whatever. Actually, 72 cores for them, and I think that's the max. And we would give them many replicas, and they would go through that, and then they'd say, size us down at the end of the year. That's okay. And we have the Sky DBA offering, which is experts that will run your database. That's pretty unique. Right? You don't have that with any other cloud. Like, hey. Can I get a DBA to come in and alter my table? Well, you can with SkySQL, and I used to lead that team. These are really bright guys and gals, actually. And I think that is powerful. And now with expand, we can just have their nodes grow and shrink.
They're very elastic when they have that type of workload happening. And then we're adding things like geospatial APIs, it's kind of data as a service, but it has a lot more going on, and analytics over your cloud storage, cloud buckets, things like that. So I think the value of having database experts and data experts run your database as a service and more is pretty exciting and pretty unique. And that's kind of what excites me to be here a lot of the time, is there's so much potential in having this powerful, automatic, but not automatic because you can pull in experts, database product.
[00:20:51] Unknown:
Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend. Io, 95% reported being at or overcapacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it's no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. That's where our friends at Ascend dot io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability.
Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and Open Source Spark and can be deployed in AWS, Azure, or GCP. Go to data engineering podcast.com/ascend and sign up for a free trial. If you're a data engineering podcast listener, you get credits worth $5, 000 when you become a customer. For a number of years, you know, database in the cloud has largely looked like things like Amazon RDS or Google Cloud SQL, where it's, yes, we will run that for you and we'll automate some of those kind of DBA options of managing, you know, having a fit active failover in a different zone or, you know, managing that replication, but it's not going to resize it for you or scale it out unless you're using 1 of their kind of proprietary options.
And I wonder what you see from having worked with SkySQL and providing that as a service. I'm curious what you see as some of the future of databases in the cloud, you know, where we have had this paradigm of that RDS, Cloud SQL, and now we're exploring a lot more with these dynamically scalable, largely serverless, kind of just throw the data in and it does what I want it to situations.
[00:22:47] Unknown:
Yeah. And that's definitely something that we're working on. We have a few things up our sleeves on that. But I think workload automation, tuning automatically, indexing automatically, these types of things are the future of database as a service. You can do all of that, but if you don't have a person in a lot of cases, and this is true, I think a data scientist would tell you that as well, there's cases where the machine learning will miss, or it'll just be flat out wrong. And having an expert to fall back on, which I'll admit is not scalable, but we do a pretty good job, I think a pretty fantastic job with Fortune fifties on this.
Having that expert step in and and remind the automation where things need to be, I think is value. And, you know, you could have this new generation relational database, you could have something that's supporting millions of users, but the wrong alter at the wrong time, even if it's done by smart automation, could take down your cloud, or your mobile app, or your website. And I think that's really the differentiator there. So, you know, serverless is coming. Serverless is this new hot, but it's still someone else's computer. You know what I mean? Always.
[00:24:09] Unknown:
Yeah. You mean to say that it doesn't just mean that it's all happening mystically in the cloud? There are no machines happening under under there? I don't have to worry about actual machine failures?
[00:24:20] Unknown:
It's all magic. It's all magic. If only.
[00:24:24] Unknown:
Powered by unicorns. Yeah.
[00:24:26] Unknown:
I mean, I would say my services team wouldn't be so successful. My enterprise architects that work with these really large companies would be out of jobs if other clouds did it as well as we did.
[00:24:40] Unknown:
And so another interesting element that I really wanna dig into, particularly given the focus of this podcast, is the column store capability that you're offering where you do have the ability to treat the data with that HTAP paradigm. There are some other systems out there that offer that capability, you know, to varying levels of success. Most of them aren't open source. I'm I'm wondering if you can describe some of the kind of architectural fundamentals that go into providing that columnar view on the data and being able to map from the transactional into the analytical workspace at the engine level?
[00:25:17] Unknown:
Yeah. So columnstore, it stores each column, as separate files in the file system, or tablespaces, or whatever you wanna call them. These are stored usually on cloud buckets, cloud services, and we also have the ability to store them on your local network, or s 3 compatible storage. It's all replicated in that analytical cluster, and you can actually have your front end MariaDB, which many people still see as that old database, but have not seen the new modern MariaDB. Obviously, it's evolved. Right? A lot of people have been using whatever in their enterprise for years, and are like, MariaDB 105, 106? We're on 1011.
Right? And we have all these new modern features that developers would love, you know, MongoDB compatibility, Oracle compatibility. Right? And I think that having that going through that interface, I would say, of a highly available database with a standardized, you know, the MySQL, SQL PSM, standardized SQL interface that can actually join that data from those storage engines, from columnar, where you can water all of my thermometers across the state saying, and join that with your local, you know, DB data, and perhaps even shard it with spider to many databases behind the scenes. It's just it's amazing, the potential.
And I've talked about a lot of that with some of our customers, and some of them are using them in those really novel ways, but we've also added an analytical index similarly in expand. So you get that it's shared nothing with Xpand, right? So there's a little bit of an advantage there, and you can actually run queries with indexes that are stored in a more columnar fashion there. I think it's the only distributed SQL database to do that. On the other hand, columnstore is really great for those analytical, just straight up analytical queries, OLAP.
We do have quite a few BI use cases. So columnstore gives us a powerful way to move data from OLTP to OLAP live. Right? Stream it. And I think you join that with a lot of the tools we all know and love. It's cool. You don't have to go learn something else like Vertica.
[00:27:39] Unknown:
See, back to your point of you have people who have been using MariaDB for years now, and they don't realize the capabilities that have been added in. I'm curious how you think about that kind of customer communication and customer education of, oh, you want to do that. Well, it already does that out of the box. You don't need another system. You know, we're just kind of surfacing that information for people who do start to kind of over architect their systems because they don't fully understand or they don't take the time to reacquaint themselves with what new features are being introduced in each release.
[00:28:12] Unknown:
I've been in shops where they have a lot of people that are just out of college, and they'll be like, well, the front end's on MongoDB, and then we got this cloud service to do our analytics, and we're running a messaging queue, and we're doing this and that. Cool. Then they'll call us back after they have thousands of customers. We can't handle the workload. What do we do? Well, first, stop storing all of your data a million times so that it renders faster. Because you can still do that with an OLTP database. It might not be hot and cool, but it's survived this long because it works. Right?
And I think you start there, and now you're like, well, why are you paying x amount of money to Oracle, Amazon, whoever else? You already have your analytics right there. Just create table, engine equals column store. Right? Or our orders table is going crazy. And I can't tell you how many companies have said those words. Our orders table, we can't handle the rights. What do we do? Oh, okay. Alter table engine equals my rocks, or copy the data to expand and just add nodes as you need them. It's really simple if you know the products and you remember that MariaDB is modern, which people are kinda shocked. Like, I'll walk in, and I'll just do a quick brown bag. They'll be like, what?
We've been using 10.0 or 101 for so long, or even 5:5. You do all these things. I'll be like, yeah, let's upgrade. Let's upgrade. And now you have magic. Right? Obviously, you have to teach them how to use it. What are the benefits? What are the pitfalls? I'm really big on that, being sort of honest on that in terms of, I guess, what's important to our customers. You see that, like, where we run, for example, the ServiceNow Cloud, which is obviously not the young company I was talking about, but they have a lot of MariaDB running, quite a lot. And I think that's 1 of our cool use cases where they can actually just make it do anything because of MariaDB.
[00:30:22] Unknown:
Digging more into the kind of data architecture aspect of an engine like MariaDB, where it does have these pluggable storage capabilities. It is adding new kind of data primitives in the engine. You were talking about GIS. You know, there's the question of JSON. I'm wondering what are some of the kind of interesting misconceptions or misuses or underuses of MariaDB that you've seen for people who are trying to build, you know, more complicated applications or, I guess, more commonly, trying to use MariaDB as their transactional store and then replicate that into some other system for being able to power analytical workloads or being able to build derived data products on top of those transactional resources?
[00:31:07] Unknown:
I think the biggest 1 is JSON. Right? I have unstructured data. Okay. Well, we'll put in MariaDB, let's go to document whatever document DB they wanna go to. Right? And I'll be like, well, just put it in this table and call it with JSON functions. Run it through our MongoDB NoSQL router. It's compatible with MongoDB. There's lots of options there, right, for unstructured data. And you show them that, and that they don't have to duplicate their data. Now you've saved money on storage. You're no longer having to maintain many open connections out of your unique ability.
Some of the other unique ones that are coming to mind, I was thinking of a drug store I've worked with. They actually have an application that uses MariaDB for the front end web page, but they render a lot of things out of a document store. They have their back end shopping cart in SQL Server. They store their customer data in Oracle. As an engineer and an architect, it just doesn't make sense to me. Why are you spending so much money with so many vendors, and and having to maintain SMEs and developers for all of these products when you could just use 1 of them. Right? And that's my struggle as a leader. Right? I don't know that as an engineering leader, I would make that choice.
I find it difficult to justify something like that. They were also putting in an analytics store, and I was like, you have column store here. They were creating connectors to be able to access other databases for DB links. I was like, we have Spider. You just connect the databases and copy the data over or access it directly. I think there's a lot of education or a lot of things that people misconstrue because maybe they used it once in college, and maybe they used the old 1. Like, if you went to current versions of Ubuntu or RHEL, sometimes you'll get 55 when you do install yum install MariaDB or MySQL. You get an old version because you may not have install MariaDB or MySQL.
You get an old version because you may not have added our repos. And that creates a misconception, I think. We're not the WordPress database. Right? We're the database that's running Samsung Cloud, ServiceNow's cloud. Right? IHME, we were all looking at it during the COVID pandemic. Right? How many cases are there? What are the estimates? Oh my god. Bill Gates said this many people are gonna die. Remember April 2020. Right? That was all powered by MariaDB ColumnStore. That wasn't Microsoft doing something powerful. This came from IHME, which is part of the Bill and Melinda Gates Foundation, and they're 1 of our big customers. They use our Sky DBAs.
They help them manage that columnar data. There's a lot of cases I see with customers. I've seen a lot of similar themes with the 100, maybe 1, 000 of customers I've worked with in my consulting career.
[00:34:11] Unknown:
RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enables you to automatically send data to hundreds of downstream tools. Sign up for free today at dataengineeringpodcast.com/rudder. Another interesting aspect of just kind of the way that databases are used is the question of how much of the business logic should live in the database versus how much of it lives in the application, and the database is just there for dump storage. And I'm curious what you see as some of the ways to gauge that balance based on kind of the use case and the capabilities of the underlying engine.
[00:35:06] Unknown:
It's an inclusive or. I was gonna say yes. So it's really dependent. I think when I was early in my consulting career, everybody had to normalize their data. Actually, early in my DBA career. Normalize all your data, put all of your logic in your app, take it out of the database, and we're happy. I think I've learned there's a lot of good cases for flattening your data. There's a lot of good cases for triggers and stored procedures. I would say you don't wanna put, like, a 2, 000 line stored procedure in your database anywhere, but especially not in expand. It's meant to be a really fast OLTP engine with that parallel replication, right, and higher throughput.
But if you're doing column store, put that 2, 000 row stored procedure in MariaDB Enterprise. It'll handle it. And write it in PLSQL if you want. You can do packages. Right? So probably with InnoDB as a storage engine, I'd find a balance there. I think 500 lines is egregious. I've installed Rundeck in a lot of companies, or said, hey. You have Jenkins. Right? Or a CronJob with Bash. You know, there's other ways to do things or use your application, create a microservice. There's a lot of ways that are efficient, and I would say something I learned from 1 of my mentors was it depends.
[00:36:32] Unknown:
Yes. The ubiquitous answer.
[00:36:35] Unknown:
Exactly. Yep. With a Russian accent.
[00:36:39] Unknown:
Even better. And maybe digging a bit more into some of the, I guess, data type specific or kind of specific kind of logical capabilities of MariaDB, I'm wondering what are some of the ones that you find most interesting or most compelling, and either ones that are upcoming or that were recently introduced or things that you that have been around for a while but that people often overlook or don't realize are there just as far as like, oh, I didn't know you could just do a, you know, select call function name, and it's magical.
[00:37:12] Unknown:
That's hard. So on my Twitter, a lot of times, I'll do a mariadb doc of the day hashtag, because I learn every day. I think I would not be successful if I didn't constantly ask questions and learn on a daily basis. I'll start with as a company, Samsung Cloud, like 10, 000, 000, 000 daily requests on expand. Like, I've seen the monitoring graph. I was like, oh my god. Supercat. Right? They use expand for games, and it handles these crazy workload increases when games are popular. Or a fortune 500 financial company, you know, maybe you have your retirement on them. $2, 000, 000, 000, 000 in assets are on expand.
And I think that type of thing is like, wow. Right? Those are interesting points to me that kind of make me proud to work here beyond being the fanboy and being like, oh my god. I work with the people that used to be my mentors. Right? That's pretty cool. But on the feature side, like, I just learn amazing things. Like, we have a procedure that'll scan your table and be like, that VARCHAR 255, you've never had more than 50 characters in it. It should be 50 characters. Right? Or I guess the power of max scale. Like, just every day, I'm like, oh, it does that?
That's neat. I mean, it has an IDE GUI. Did you know that? A lot of people don't know that. It has the ability to split your reads and writes, but rewrite them if you want them to. You could be like, every time someone types select 1, make it select 2. You can have it pipe things to Kafka. Every query can just be mirrored there. There's just, like, all these cool things that I learn about our products that I think are really cool. And that's where in my field of experts where I'm like, did you know this? He's like, yeah. Yeah. I've been dealing with that for a while. And I'll go to another, and it's just really cool. It's really cool to be in a place that values learning, and I think teaching that to our customers and our users is really exciting. I think you could see some of that passion in my YouTube videos.
So sometimes I'll be asked a question, I'll be like, I'm gonna go make a YouTube video about that. It's pretty neat. I'm actually doing some more next week.
[00:39:34] Unknown:
We've touched a lot on some of the kind of interesting or innovative or unexpected ways that you've seen MariaDB used. Are there any other examples that you wanna call out?
[00:39:42] Unknown:
I think I've talked about a lot of our use cases with the larger ones, but I've seen small use cases. I think that's pretty unique. Right? We do have embedded use cases still. I think MySQL started that way. MariaDB continues that. So you might be using a router that has MariaDB running on it right now. You might be using some other product that has embedded Linux with MariaDB. I think that's pretty cool. I'm not saying we're SQLite, Right? But there's a lot of use cases like that I find really interesting because people have thought outside the box and gotten it to do something interesting.
Like that temperature thing that I mentioned earlier. Like, we do have some users that I think my thermostat actually, my smart thermostat uses column store in the back. So I think that's pretty cool. Like, I also think it's weird that the smart thermostat company happens to know the temperature in my house, but, you know
[00:40:43] Unknown:
Yep. And in your experience of working at MariaDB and helping to direct the products and understand the use cases and technical requirements of your customers, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:40:59] Unknown:
I think coming up as a consultant and a fanboy, again, I keep saying that, but I had a lot of faith in some of the products, especially what certain things can do. And I think 1 of the hard lessons I learned is somebody will use our products in a unique way, and I'll find if they're using an older version where we haven't necessarily squashed something, we have to redirect them to a newer modern version. And 1 of the forks, for example, had a bug with losing data on backup, and that wasn't a MariaDB product. But they were coming to MariaDB using those backups to restore. And that meant that we had to think outside the box and create an enterprise backup solution that helped that customer succeed. And I think that kind of led into, well, let's make an enterprise backup solution, and that's what we did.
But you don't know when someone's gonna run 2, 000 rename tables a second. You just don't. And that was the case there. You don't know if somebody's going to lock 20 tables because they thought they were locking 1 row. It's a hard lesson, and I think understanding that and asking questions and learning, and then continuing to make your products better and meet those use cases, I think is really cool. So that's why, like, ease of use and the developer experience is really important to us right now. We have a developer advocate, and he's been helping us work through, well, how can we make it easier to use our products? You know, we're trying to move away from manual sharding, expand. Right? We're trying to make it easier to install and configure our databases. With SkySQL, we wanna make it so you're not even worrying about that. You're just worrying about, is my data there? Can I access it? Is it fast?
Right? Is it durable? Is it atomic? You know, those are, I think, more important things for developers and users than, oh, man, I gotta take a backup. I gotta do this. That's kinda where we're at. I guess I look at what did I hate as a DBA? And I talk to our customers and users. I'll be like, do you hate that too? Okay. And I go talk to our PMs, and we get it done. So I think there's that, you know, we still listen to, like, what the needs are, and what do they need out of a cloud? And that might not be something they can get from a hyperscaler. Right? We have a big advantage in that we own our products. We can make changes. We can add features or fix bugs, And that's all surfaced in SkySQL, actually. So SkySQL has been a big benefit because more often than not, we'll see a feature need or a bug in SkySQL before our customers see it.
And I think over the last few years, you'll see that we've done a lot of innovation because we need to make our fully managed cloud better. And that's not something necessarily the hyperscalers can do with all the open source projects that they've kind of just lifted and put into their product suite.
[00:44:16] Unknown:
For people who are looking for the storage system that they want to use to power their data use case, whether it's a transactional application, just a CRUD app, or a, you know, high scale data analytics use case or anything in between? What are the cases where MariaDB is the wrong choice?
[00:44:36] Unknown:
We don't have a messaging queue. I like Kafka. There's a lot of options out there though. We work well with Kafka. I would say that's a hard 1. That's a really hard 1 for me to answer because, again, when I say the word fanboy again, there's a lot of times where I don't necessarily see why someone would do something. There are definitely reasons to use Oracle or SQL Server or other databases, but I haven't seen them in a lot of our customers, I'll be honest. And that's because we have that flexibility in our use cases and our storage. You can't run SharePoint on MariaDB.
That's that's 1 that's difficult, of course. I think there's a lot of Oracle products that require Oracle database. That makes it difficult. So you probably can't do that. But I've had people try to varying degrees of success. I would say that's a hard 1. That's a really hard 1 for me.
[00:45:43] Unknown:
And as you continue to work on MariaDB and help to grow its capabilities and work with your customers to understand their requirements and use cases. What are some of the things you have planned for the near to medium term, or any problem areas that you're excited to dig into?
[00:45:57] Unknown:
I'm excited that we're gonna go public soon. But with our customers and our products, I'm excited to kinda dig into geospatial. It's not something that I've done a lot of, but I've recently been speaking with the the QbWorks team from our our recent acquisition, and these are guys that worked on, like, Oracle's geospatial features, like, 30 years ago, and now they've created something, like, really cool. It's something I'm excited to learn about, and I think, again, you can see, like, my love of learning. I think getting into that space and wanting to become an expert in that is really cool, But I'm also really excited about, like, our command line tools. I mean, it's not sexy, but, like, making things easier for people's day to day is, like, my passion, I think.
Making it so that our users aren't wondering why we made an engineering choice, I think is really important. Because I don't want people to think of that old thing that I'm nostalgic for. I want them to think of the future of database, right, when they think of us. And so you do have cool things like that, geospatial and the analytics and whatnot. But I think Xpand is gonna do some really cool things, and it already is. And that's something that I've been excited about since we brought Xpand in. But now that we have it in SkySQL and we have these really cool customers, I think we're gonna see more of that.
[00:47:26] Unknown:
Are there any other aspects of the MariaDB project or the overall product suite or the business itself that we didn't discuss yet that you'd like to cover before we close out the show? I would say that a lot of people
[00:47:41] Unknown:
want a great quality customer experience with experts that know what they're talking about on the first touch. Right? They don't wanna talk to someone, a call center that tells them to restart the database. They wanna talk to someone that's been in it with them. Right? Someone that understands where they're coming from. And they don't wanna spend a lot of money for that, and I think that's where we come in. We're able to be what they need us to be and have the experts they need to make it work the way experts they need to make it work the way they want it to work.
[00:48:19] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today?
[00:48:35] Unknown:
Oh, man. I thought about that question for a bit, and I think about it daily. What is our biggest gap, and what can we do better? What can I do better? And I think you're gonna see more of solving that on our end, but globally, I I hate the multiple SQL languages, and the companies that are like, this is something else QL. Right? We took the SQL standard, which has existed, I don't know, 50 years, and we just decided we don't like that 1 word, so we're gonna do it this way. Or, you know, like Oracle, like, we're just gonna add all these things that nobody else can use. Or the 1 that really gets me, whenever I use SQL Server, Right? Select the top 50. How do I paginate?
You know, like, I hate these little quirks, and I wish that the standard meant more. I think that causes a lot of people to not wanna learn SQL, and that's why you have, I don't know, ORM number 501. Right? There's just, like, there's that, and the dream of lift and shift doesn't really exist because of that. Right? Like, you could ORM and abstract everything in your app, but you can't move it easily. And I think our legacy players, our legacy RDBMS, that's what they want. Right? They want it so that you can't move to open source. You can't make it cheaper to run your data or your products or innovate even.
And I think that's bad for innovation, and growth, and the web, and technology as a whole. I think that's why I value a lot of our open source commitment in being 1 of the only open source enterprise databases, in air quotes, because we're enterprise quality, but we're still committed to that. And I think that, plus providing the cloud things that people need, is neat. And I think GUIs are the future there. I think there's still a need for strong command lines, which you don't get with the people that have moved away from SQL.
[00:50:50] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share the work that you and your team are doing at MariaDB and your experience of being with the project ever since its early days. So appreciate all of the time and energy that you and everyone at MariaDB is doing to help make it easier to be able to store and work with data at varying scales and use cases. So thank you again for that, and I hope you enjoy the rest of your day. Yeah. Thank you for having me and letting me geek out. Always.
[00:51:23] Unknown:
Thank you for listening. Don't forget to check out our other shows, podcast.init, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts, and just tell your friends and coworkers.
Introduction and Sponsor Messages
Interview with Manjot Singh: Introduction and Background
Overview of MariaDB and Its History
Maintaining Compatibility with MySQL
Unique Features of MariaDB
MariaDB's Commercial Offerings: Xpand and SkySQL
Future of Databases in the Cloud
Serverless Databases and Automation
Misconceptions and Misuses of MariaDB
Business Logic in the Database vs. Application
Interesting Features and Use Cases of MariaDB
Lessons Learned and Future Plans
Customer Experience and Support
Biggest Gaps in Data Management Tooling
Closing Remarks and Thank You