Going From Transactional To Analytical And Self-managed To Cloud On One Database With MariaDB

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode.

With their new managed database service, you can launch a production ready MySQL,

Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs.

Go to dataengineeringpodcast.com/linode

today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show.

You wake up to a Slack message from your CEO who's upset because the company's revenue dashboard is broken. You're told to fix it before this morning's board meeting, which is just minutes away. Enter Metaplane, the industry's only self serve data observability tool. In just a few clicks, you identify the issue's root cause, conduct an impact analysis, and save the day. Data leaders at Imperfect Foods, Drift, and Vendor love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast.

Sign up for a free forever plan at dataengineeringpodcast.com/metaplane,

or try out their most advanced features with a 14 day free trial. And if you mentioned the podcast, you get a free in data we trust world tour t shirt. Your host is Tobias Macy. And today, I'm interviewing Manjot Singh about MariaDB,

1 of the leading open source database engines and the suite of commercial offerings that are available to build on top of that. So, Manjot, can you start by introducing yourself?

My name is Manjot Singh. I am field CTO at MariaDB.

I also lead our customer engineering department, which

sometimes I'll call them ninjas and rock stars, but they all hate those words, so I just call them our top level experts.

I'm really privileged to work with these really cool people.

Do you remember how you first got started working in data?

Yeah. That's actually a really interesting story.

I got into making websites, I think I was 13,

and I learned off HTML goodies, and I thought, you know, wow, I'm really smart.

Someone called me a script kitty. I think it was the late nineties.

And from there, I I wanted to figure out how to direct that. And so I

started a hosting company with a friend in high school.

He had Comcast, and we put a put a Linux box in his living room, and

we tried to host some websites.

As you can tell, young kids, we weren't great at marketing.

But I did start working on Pearl websites for my local Sikh temple,

and they they just needed me to host some media. I used flat files,

and I realized around, I wanna say around 99, 2000,

that files aren't fun. Making websites in notepad isn't great.

And there was this cool new thing called MySQL.

And I think it was MySQL 3.23,

which was probably the oldest version most people have run-in production.

I set that up with this cool new language called PHP,

and I started just programming websites, making things for friends and people that would pay me, like, $20.

And

I went from there to creating a career, I think, in IT where I started as a sysadmin.

And I accidentally

fell back into actually using MySQL as a professional.

I got solicited by a headhunter,

and the headhunter goes,

have you ever used MySQL? And I was like, oh, yeah.

Tons of times. I made all these websites. I'm still maintaining a lot of them.

And she's like, great. You wanna do that as your professional

career? And so I called 1 of my friends that worked at Sun,

and they had just purchased my sequel. He goes, oh, here's all of the test material.

Just read this and you'll do fine. So I go to the interview,

I'm sitting in front of this panel.

And at the end of the panel, the guy's like, you're the only 1 here that's actually installed MySQL,

so we're probably gonna move forward with you.

They were interviewing Oracle DBAs. And from there, I found definitely interesting that you've got such a long history with the, kind

of,

foundational technology and that you've now landed with the MariaDB

project since this is an outgrowth of that MySQL heritage. And so

wondering if you can give a bit of an overview about, for people who aren't familiar, what MariaDB is and maybe some of the story behind how it came to be its own entity versus

the we don't need to dig into any sort of, like, the politics aspect of it, but, you know, just some of the history of how we came to be where we had now have MariaDB and MySQL where they used to be the same thing.

Yeah. And, actually, I learned some of this over a dinner with Monty a few years ago, and I'm sitting next to

others that had come up with him. They're like, yeah. We heard of this cool thing. I made the PHP connector, and I did this. And I was like,

this was the history of the Internet. Like, I'm like, you guys got together

1 night in college, and just created everything the Internet was built on. I was kinda starstruck, and it was pretty cool. I became a MySQL MariaDB fanboy

when I started going to conferences with that first job,

and

I learned quite a bit. Monty made this product named after, I think, his first daughter, Maya,

MySQL, or MySQL as they say in Europe,

and that's still a debate in the company, by the way.

He created this cool database, and he had a lot of great ideas, 1 of them being the pluggable storage engine,

where you could just trade out the back end that the data's being stored on and have a completely different use case. But the SQL syntax, the language, the handler, all of that would be the same.

And I think that dream really led him to create this company

and have success in selling it to 1 of the leading open source companies, Sun Microsystems.

I think

that

when it sold to Oracle, it was unexpected

by a lot of the team that was originally there. And I know this, like, 3rd and 4th hand. Right? I wasn't there.

And for them, they still wanted to create that passion, that storage engine passion, because they saw Oracle owned Innobase, which became Innodb,

and

they wanted to make that MySQL.

Right? So it had the 1 storage engine.

And on the other hand,

we had so many other ideas.

They wanted to compete with Oracle's

enterprise database. They wanted to compete with other legacy RDBMSs, as I'll call them. Right?

And I think

that dream of having a really flexible

open source database that doesn't

necessarily have the chance to be closed source, like, for example, Oracle's enterprise MySQL

is is closed source. Right? I think that brought them to that. And

his second daughter, Maria,

he already had a storage engine named after her, which was I call it my ISAM too,

kind of. Right? It does replace a lot of my ISAM's use cases. He took that. He renamed that to ARIA, took the m off, and then named the whole database for MariaDB. And since then, we've really taken our own sort of view on it. We haven't been a fork in a long time.

For example, it's been our own code, which is open source, and we have a thriving

community around it. Probably largely because of its association with Oracle and the direction that they were pushing MySQL, MariaDB

ended up being the kind of broadly adopted community option for

MySQL compatible workloads.

And

I know that MariaDB

has focused heavily on maintaining that compatibility even as the 2 projects have diverged, and it has been

a number of years now. I don't know the exact number, but probably at least on the order of 5 or 6. And I'm wondering

how you have approached that challenge of being able to maintain compatibility

between these 2 disparate projects

that have, you know,

widely diverging underlying implementation details now, but making it as smooth as possible for people to be able to migrate from a MySQL

to a MariaDB

without having to do a bunch of code changes accompanying it? That's a hard 1. So I think SQLPSM,

which is the the syntax that's used in my SQL and and read DB and compatible is

something that's important to a lot of us. We wanna maintain that compatibility.

Now there are a lot of places where we necessarily aren't compatible,

but in most cases, I like I would say 99.99%

of cases, you can drop in MariaDB.

It's important for us to make it easy for our community to use our products. Right?

And I think

we're gonna do more of that.

And so you have smart engineers at Oracle. You have smart engineers here, and at all the other forks. And a lot of times we come to the same conclusions.

You'll notice that MariaDB did make a lot of features first,

and eventually, they went into the other forks or into MySQL,

and

vice versa, of course.

And you'll find that a lot of times, they're approaching similar problems and solving them the same ways. And a lot of that is customer driven. Right? If we have customers that say, well, we wanna do this, or we wanna migrate from

we help them with that. Right? We make it as as easy for them as as possible, I think. And we do try

to be syntactically compatible, just like we have with our Oracle layer, for example. You can actually use PLSQL in MariaDB

and ANSI SQL.

As far as the

ways that that shapes the overarching project, I'm wondering

if you see that kind of constraint of maintaining

broad compatibility with MySQL as

a benefit because it provides focus or as a,

you know, set of shackles because you want to go in your own direction, but you can't justifiably do that because then you're going to be kind of ditching a whole bunch of customers who you would otherwise be able

to support. Yeah. And I think that's a fine line, and I think we've walked that fine line pretty well.

I think if we go too far 1 way or the other, the community tells us,

and we do what's best for our our users and our customers. And

I think we do it better than MySQL.

And I'm not trying to put them down, but they have a very clear,

we support this version,

and we'll support replication from the last version and features from the last version.

For us,

we've actually struggled because we have so many versions in support most of the time, and I think that kinda led to some of our recent changes. But there was a time we had 5 GA versions just recently

that were all in support,

and we could replicate as far back as, like, 5.1.

I just helped a customer.

They went from 5.2

to 10 dot 3, skipping all those versions, and we could replicate

from any of them.

So I feel like we've done a pretty good job of maintaining that replication,

that compatibility,

and I think we're unique in that aspect.

Kind of projecting forward a little bit, I'm curious if you see

anything that would potentially motivate you to say that, you know,

we want to chart our own path. We have, you know, diverged far enough. It has been enough time between the initial fork to where we are now that we don't feel bound

to MySQL as kind of defining our future trajectory. We want to go be our own thing. You know, if you want to migrate from MySQL, then you'll have to do it, you know, from this older version and then upgrade to where we are now

and just kind of, like, cut that tie.

I personally

feel like we're already there. I mean, we've been there for some time. You know, we took our own path by adding the Oracle compatibility.

We don't have some of the 5.7 features the way that MySQL has it. We did our own

methods, right? For a lot of these features.

There's differences in the way we do JSON and GIS.

They're very minor and they're easy to work around,

but I think our software architects would tell you that we did it better.

And so

in terms of the

places that you have gone your own or added your own capabilities, what are some of those features that are unique to MariaDB and aren't available either in MySQL or even, you know, the broad majority of other either open source or commercial

relational engines?

If we just look at MySQL and its forks,

storage engines.

Right? We have a storage engine that is just like a read only engine,

ARIA,

but it is transactional.

We're also fully ACID compliant, which MySQL wasn't until they removed my ISAM.

We have column store. That's the big 1, I think. Being able to do analytics

and join them to your OLTP tables

easily within the same command line. And perhaps with my rocks, having a high write workload.

I worked with a client that put in my rocks, and they were doing millions of rows an hour for probably even more than that.

And they were struggling on InnoDB and other engines. We put in my rocks, and they were doing great. So there's a lot of features there, but I think there's also

value in our other products such as Xpand. Right?

You put that together

with our enterprise server, you have a command line compatible

database.

Again, SQL PSM, which is important to us.

It's actually more similar to SQL PSM and my SQL than MariaDB in some cases,

but it's distributed SQL. No more worrying about sharding, no more worrying about how do I add and remove nodes, create replicas. It just

does it. And that's pretty cool. That's something that, as a DBA, I spend a lot of time on. Right? Let's create a replica.

Let's copy the data over. Let's make sure all the IDs are correct. There's none of that with Xpand. And MaxScale actually brings that to enterprise server as well.

Digging more into sort of the commercial offerings and what you see as being

kind of commercial capabilities. I'm wondering if you can talk to some of the kind of broad use cases

and features that you're aiming for with those commercial options and how you think about, you know, if and or when

those capabilities,

you know, might land into sort of the the community distribution.

Yeah. So our features do go to enterprise first in in a lot of cases,

but they'll make their way to community. So far, it's been, like, I'd say a year or 2 lag, not not anything crazy.

I can't speak to the policy on that necessarily, but

I'm a big open source MySQL MariaDB fanboy, and I think that's what,

I guess, people like about me, but at the same time, we need to have something that is stable

and secure

and compliant

for our customers, and I think

that's really

the differentiator.

And eventually, all of that needs to be in community as well, just to make the world a better place, and I think a lot of it does. You know, even BSL, as much as some other companies, some of our competitors would like to put it down,

it's somewhat fair. Right? We

make our money off the code. It's it's like a patent. Right? We'll sell it, our customers benefit, our users benefit,

and a few years later, it's GPL, and someone can

innovate on that if they want to.

So digging more into the

kind

of capabilities of MariaDB

and particularly some of the enterprise oriented features you were talking about expand.

I have definitely had many conversations with people who have banged their head against replication and sharding and trying to scale out their MySQL installations.

I'm wondering if you can maybe start with some of the underlying architectural

concepts that go into expand and making it able to kind of scale out horizontally without all of that additional headache and manual fine tuning.

I come from being a DBA and consultant. Right? So

not a c developer, so I'll speak to how it behaves. And

I think

expanding to Paxos based cluster,

it has

some features that are pretty unique. It's distributed SQL, it lets you do, I would say, a lot of parallel queries,

And at most, you get, like, 2 hops. So it knows where the data is. It cuts it up into slices and it replicates it. So it copies these replicas, like, to each node, you can actually control that. So you could have no replicas or you could have every node hold all the data. But if you're someone like Samsung, for example,

you might have 30, 40 nodes in a cluster, and you can actually locate the data in different places. I'm sure a lot of that is familiar with other distributed SQL products.

The slices are pretty cool. They hold little parts, I would say, of your table.

And those sets of rows, it knows how to track and join them in a parallel fashion and take advantage of all of the hardware across your cluster.

That's something unique that

a lot of MySQL MariaDB

users might find valuable. Right?

And you can add another node, it'll automatically rebalance the clusters, so it has this feature called rebalancer.

I think that's really cool as well. When I worked at HP Helion, for example, I helped put in 1 of the early versions of Elastic

into the cloud. And this reminds me somewhat of that, the way that it slices things up. But I think it's very sophisticated and smart, and that's attractive to me.

Another 1 of the offerings is SkySQL,

which is serverless database as a service.

And I know that that is definitely a direction that a lot of people are moving to with the growth of cloud, with a lot more kind of cloud native

workloads where you don't want to have to think about where is the database, what is it running, how do I scale it. And so I'm curious if you can talk to

what that offering looks like and some of the capabilities

that that brings on top of the underlying MariaDB engine.

Our goal with SkySQL is security first. Right? That's 1 of the our early PMs was, like,

secure by default. And that's actually been our take with enterprise server and our other products.

But SkySQL is really about enabling any workload at any scale,

but you're not having to deploy all these random other products. Right? I have to go over here so I can have documents. I have to go over here for column store. I have to go over here for analytics, and I have to go over here for my OLTP.

And with SkySQL, you can have a lot of that just in 1 command line. Right? And you can join across many of those features.

So any workload that's whether it's transactional or analytical,

and then any scale. So

expand is actually really powerful for us there because of that elastic scale out and scale in. We have 1 client that every Black Friday. Right?

Early on, when before we had the ability to do this,

we would just make their nodes bigger.

Right? Black Friday's coming up. We need a 128

cores or whatever.

Actually, 72 cores for them, and

I think that's the max. And we would give them many replicas,

and they would go through that, and then they'd say, size us down at the end of the year.

That's okay. And we have the Sky DBA offering, which is experts

that will run your database. That's pretty unique. Right? You don't have that with any other cloud. Like, hey. Can I get a DBA to come in and alter my table?

Well, you can with SkySQL,

and I used to lead that team. These are really bright guys and gals, actually. And

I think that is powerful. And now with expand, we can just have their nodes grow and shrink.

They're very elastic when they have that type of workload happening.

And then we're adding things like geospatial APIs,

it's kind of data as a service, but it has a lot more going on, and analytics over

your cloud storage, cloud buckets, things like that. So

I think the value of having database

experts and data experts

run your database as a service

and more

is pretty exciting and pretty unique. And that's kind of what excites me to be here a lot of the time, is there's so much potential

in having this powerful,

automatic,

but not automatic because you can pull in experts,

database product.

Data teams are increasingly under pressure to deliver.

According to a recent survey by Ascend. Io, 95%

reported being at or overcapacity.

With 72% of data experts reporting demands on their team going up faster than they can hire, it's no surprise they are increasingly turning to automation.

In fact, while only 3.5%

report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. That's where our friends at Ascend dot io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation,

orchestration, and observability.

Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug in architecture,

as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and Open Source Spark and can be deployed in AWS, Azure, or GCP.

Go to data engineering podcast.com/ascend

and sign up for a free trial. If you're a data engineering podcast listener, you get credits worth $5, 000 when you become a customer.

For a number of years, you know, database in the cloud has largely looked like things like

Amazon RDS or Google Cloud SQL, where it's, yes, we will run that for you and we'll automate some of those kind of DBA options of

managing, you know, having a fit active failover in a different zone or,

you know, managing that replication, but it's not going to resize it for you or scale it out unless you're using 1 of their kind of proprietary options.

And I wonder what you see

from having worked with SkySQL and providing that as a service. I'm curious what you see as some of the future of

databases in the cloud, you know, where we have had this

paradigm of that RDS, Cloud SQL, and now we're exploring a lot more with these dynamically scalable, largely serverless, kind of just throw the data in and it does what I want it to situations.

Yeah. And that's definitely something that we're working on. We have a few things up our sleeves on that. But I think workload automation,

tuning

automatically,

indexing automatically, these types of things are the future of database as a service.

You can do all of that, but if you don't have a person in a lot of cases, and this is true, I think a data scientist would tell you that as well,

there's cases where the machine learning will miss,

or it'll just be flat out wrong. And having an expert to fall back on,

which

I'll admit is not scalable,

but we do a pretty good job, I think a pretty fantastic job with Fortune fifties

on this.

Having that expert step in and and

remind

the automation where things need to be, I think is value.

And, you know, you could have this new generation relational database, you could have something that's supporting millions of users,

but the wrong alter at the wrong time,

even if it's done by smart automation,

could take down

your cloud, or your mobile app, or your website. And

I think that's

really the differentiator there. So, you know, serverless is coming. Serverless is this new hot,

but it's still someone else's computer.

You know what I mean? Always.

Yeah.

You mean to say that it doesn't just mean that it's all happening mystically in the cloud? There are no machines happening under under there? I don't have to worry about actual machine failures?

It's all magic. It's all magic. If only.

Powered by unicorns. Yeah.

I mean, I would say

my services team wouldn't be so successful. My enterprise architects that work with these really large companies

would be out of jobs if other clouds did it as well as we did.

And so

another interesting element that I really wanna dig into, particularly given the focus of this podcast, is the column store capability that you're offering where you do have the ability to treat the data with that HTAP paradigm.

There are some other systems out there that offer that capability,

you know, to varying levels of success. Most of them aren't open source. I'm I'm wondering if you can describe some of the kind of architectural

fundamentals that go into

providing that columnar view on the data and being able to map from the transactional into the analytical workspace at the engine level?

Yeah. So

columnstore,

it stores each column,

as separate files in the file system, or tablespaces, or whatever you wanna call them. These are stored usually on cloud

buckets, cloud services, and we also have the ability to store them on your local network, or s 3 compatible

storage.

It's all replicated in that analytical cluster, and you can actually have your front end MariaDB,

which

many people still see as that old

database,

but have not seen the new modern MariaDB.

Obviously, it's evolved. Right? A lot of people have been using whatever in their enterprise for years, and are like, MariaDB 105, 106?

We're on 1011.

Right? And

we have all these new modern features

that developers would love,

you know, MongoDB compatibility,

Oracle compatibility.

Right? And I think that having that going through that

interface, I would say,

of a highly available

database

with a standardized,

you know, the MySQL, SQL PSM,

standardized SQL interface that can actually join that data from those storage engines, from columnar,

where you can

water all of my thermometers

across the state saying,

and join that with your local, you know, DB data, and perhaps even shard it with spider

to many databases

behind the scenes. It's just it's amazing, the potential.

And I've talked about a lot of that with some of our customers, and some of them are using them in those really novel ways,

but we've also added an analytical index similarly in expand.

So you get that it's shared nothing with Xpand, right? So there's a little bit of an advantage there,

and you can actually

run queries with indexes that are stored in a more columnar fashion there. I think it's the only distributed SQL database to do that. On the other hand, columnstore is really great for those analytical,

just straight up analytical queries, OLAP.

We do have quite a few BI use cases.

So columnstore

gives us a powerful way to move data from OLTP to OLAP

live.

Right? Stream it. And I think you join that with a lot of the tools we all know and love.

It's cool.

You don't have to go learn something else like Vertica.

See, back to your point of you have people who have been using MariaDB for years now, and they don't realize the capabilities that have been added in. I'm curious how you think about that kind of customer communication and customer education of, oh, you want to do that. Well, it already does that out of the box. You don't need another system.

You know, we're just kind of surfacing that information for people who do start to kind of over architect their systems because they don't fully understand or they don't take the time to reacquaint themselves with what new features are being introduced in each release.

I've been in shops where they have a lot of people that are just out of college,

and

they'll be like, well, the front end's on MongoDB, and then we got this cloud service to do our analytics, and we're running a messaging queue, and we're doing this and that.

Cool.

Then they'll call us back after they have

thousands of customers.

We can't handle the workload.

What do we do?

Well, first, stop storing all of your data a million times so that it renders faster.

Because you can still do that with an OLTP database.

It might not be hot and cool,

but it's survived this long because it works. Right?

And I think you start there, and now you're like, well, why are you paying

x amount of money to

Oracle, Amazon, whoever else? You already have your analytics right there.

Just create table, engine equals column store.

Right?

Or our orders table is going crazy. And I can't tell you how many companies have said those words. Our orders table, we can't handle the rights. What do we do?

Oh, okay. Alter table engine equals my rocks,

or copy the data to expand

and just add nodes as you need them. It's really

simple if you know the products and you remember that MariaDB is modern, which people are kinda shocked. Like, I'll walk in, and I'll just do a quick brown bag. They'll be like,

what?

We've been using 10.0 or 101 for so long, or even 5:5.

You do all these things. I'll be like, yeah, let's upgrade.

Let's upgrade. And now you have magic. Right?

Obviously, you have to teach them how to use it. What are the benefits? What are the pitfalls?

I'm really big on that, being sort of honest on that

in terms of, I guess, what's important

to our customers.

You see that, like, where we run, for example, the ServiceNow Cloud, which is obviously not the young company I was talking about, but

they have a lot of MariaDB running,

quite a lot. And

I think that's 1 of our cool use cases where they can actually just make it do anything

because of MariaDB.

Digging more into the kind

of data architecture aspect of an engine like MariaDB, where it does have these pluggable storage capabilities.

It is adding new kind of data primitives in the engine. You were talking about GIS. You know, there's the question of JSON.

I'm wondering what are some of the kind of interesting misconceptions

or misuses or underuses of MariaDB that you've seen for people who are trying to build,

you know, more complicated applications or, I guess, more commonly, trying to use MariaDB as their transactional store

and then replicate that into some other system for being able to power analytical workloads or being able to build derived data products on top of those transactional resources?

I think the biggest 1 is JSON. Right? I have unstructured data.

Okay. Well, we'll put in MariaDB,

let's go to

document whatever document DB they wanna go to. Right? And I'll be like, well,

just put it in this table and call it with JSON functions.

Run it through our MongoDB

NoSQL

router. It's compatible with MongoDB.

There's lots of options there, right, for unstructured data.

And

you show them that, and that they don't have to duplicate their data. Now you've saved money on storage.

You're no longer having to maintain many open connections out of your

unique ability.

Some of the other unique ones that are coming to mind, I was thinking of a drug store I've worked with.

They actually have an application that uses MariaDB

for the front end web page, but they render a lot of things out of a document store. They have their back end shopping cart in SQL Server.

They store their customer data in Oracle.

As an engineer

and an architect, it just doesn't make sense to me. Why are you spending so much money with so many vendors,

and

and having to maintain SMEs and developers for all of these products

when you could just use 1 of them. Right? And that's my struggle

as a leader. Right? I don't know that as an engineering leader, I would make that choice.

I find it difficult to justify something like that. They were also putting in an analytics store, and I was like,

you have column store here.

They were creating connectors to be able to access other databases for DB links. I was like, we have Spider. You just connect the databases and

copy the data over or access it directly. I think

there's a lot of education or a lot of things that people

misconstrue because maybe they used it once in college,

and maybe they used the old 1. Like, if you went to

current versions of Ubuntu

or RHEL, sometimes you'll get 55 when you do install yum install MariaDB or MySQL. You get an old version because you may not have install MariaDB or MySQL.

You get an old version because you may not have added our repos.

And that creates a misconception, I think. We're not the WordPress database. Right?

We're the database that's running Samsung Cloud,

ServiceNow's cloud. Right? IHME,

we were all looking at it during the COVID pandemic. Right?

How many cases are there? What are the estimates? Oh my god. Bill Gates said this many people are gonna die. Remember April 2020. Right?

That was all powered by MariaDB ColumnStore.

That wasn't

Microsoft doing something powerful.

This came from IHME, which is part of the Bill and Melinda Gates Foundation, and they're 1 of our big customers. They use our Sky DBAs.

They help them manage

that columnar data. There's a lot of cases I see with customers. I've seen a lot of similar themes

with the 100, maybe 1, 000 of customers I've worked with in my consulting career.

RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control.

Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enables you to automatically send data to hundreds of downstream tools.

Sign up for free today at dataengineeringpodcast.com/rudder.

Another interesting aspect of just kind of the way that databases are used is the question of how much of the business logic should live in the database versus how much of it lives in the application, and the database is just there for dump storage. And I'm

curious what you see as some of the

ways to gauge that balance based on kind of the use case and the capabilities of the underlying engine.

It's an inclusive or. I was gonna say yes.

So it's really dependent. I think when I was early in my consulting career,

everybody had to normalize their data. Actually, early in my DBA career. Normalize all your data,

put all of your logic in your app, take it out of the database, and we're happy.

I think I've learned there's a lot of good cases for flattening your data. There's a lot of good cases for triggers and

stored procedures.

I would say you don't wanna put, like, a 2, 000

line

stored procedure in your database anywhere,

but especially not in expand. It's meant to be a really fast

OLTP engine with that parallel replication,

right, and higher throughput.

But if you're doing column store,

put that 2, 000 row stored procedure in MariaDB Enterprise.

It'll handle it. And write it in PLSQL if you want. You can do packages.

Right?

So

probably with InnoDB

as a storage engine, I'd find a balance there. I think 500 lines is egregious.

I've installed Rundeck in a lot of companies, or said, hey. You have Jenkins. Right?

Or a CronJob with Bash. You know, there's other ways to do things

or use your application, create a microservice.

There's a lot of ways that are efficient, and I would

say something I learned from 1 of my mentors was

it depends.

Yes. The ubiquitous

answer.

Exactly.

Yep. With a Russian accent.

Even better.

And

maybe digging a bit more into some of the, I guess, data type specific or kind of specific kind of logical capabilities

of MariaDB,

I'm wondering what are some of the

ones that you find most interesting or most compelling,

and either ones that are upcoming or that were recently introduced or things that you

that have been around for a while but that people often overlook or don't realize are there just as far as like, oh, I didn't know you could just do a, you know, select call function name, and it's magical.

That's hard. So on my Twitter, a lot of times, I'll do a mariadb doc of the day

hashtag,

because I learn every day.

I think

I would not be successful if I didn't constantly ask questions

and learn on a daily basis.

I'll start with as a company,

Samsung Cloud, like 10, 000, 000, 000 daily requests on expand.

Like, I've seen the monitoring graph. I was like,

oh my god.

Supercat. Right? They use expand for games, and it handles these crazy workload increases when games are popular.

Or

a fortune 500 financial company, you know, maybe you have your retirement on them.

$2, 000, 000, 000, 000 in assets are on expand.

And I think that type of thing is like, wow.

Right? Those are interesting points to me that kind of make me proud to work here

beyond being the fanboy and being like, oh my god. I work with the people that used to be my mentors.

Right?

That's pretty cool. But on the feature side,

like, I just learn amazing things. Like,

we have a procedure that'll scan your table and be like,

that VARCHAR 255,

you've never had more than 50 characters in it.

It should be 50 characters.

Right?

Or

I guess the power of max scale. Like, just

every day, I'm like, oh, it does that?

That's neat.

I mean, it has an IDE GUI. Did you know that? A lot of people don't know that. It has the ability to

split your reads and writes, but

rewrite them if you want them to. You could be like, every time someone types select 1, make it select 2.

You can have it pipe things to Kafka.

Every query can just be mirrored there. There's just, like, all these cool things that I learn about our products that I think are really cool. And that's where in my field of experts where I'm like,

did you know this? He's like, yeah. Yeah. I've been dealing with that for a while.

And I'll go to another, and it's just really cool. It's really cool to be in a place that values

learning,

and I think

teaching that to our customers and our users

is really exciting. I think you could see some of that passion in my YouTube videos.

So

sometimes I'll be asked a question, I'll be like, I'm gonna go make a YouTube video about that.

It's pretty neat. I'm actually doing some more next week.

We've touched a lot on some of the kind of interesting or innovative or unexpected ways that you've seen MariaDB used. Are there any other examples that you wanna call out?

I think I've talked about a lot of our use cases with the larger ones,

but I've seen small use cases. I think that's pretty unique. Right?

We do have embedded use cases still. I think MySQL started that way. MariaDB continues that.

So you might be using a router that has MariaDB running on it right now.

You might be using

some other product that has embedded Linux with MariaDB.

I think that's pretty cool. I'm not saying we're SQLite,

Right? But there's a lot of use cases like that I find really interesting because people have thought outside the box

and gotten it to do something

interesting.

Like that temperature thing that I mentioned earlier. Like, we do have some users that

I think my thermostat actually,

my smart thermostat uses column store in the back.

So I think that's pretty cool. Like, I also think it's weird that the smart thermostat company happens to know the temperature in my house,

but, you know

Yep.

And

in your experience of working at MariaDB

and helping to direct the products and understand the use cases and technical requirements of your customers, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

I think

coming up as a consultant and a fanboy,

again, I keep saying that,

but

I had a lot of faith in

some of the products, especially

what certain things can do.

And I think 1 of the hard lessons I learned is somebody will use our products in a unique way, and

I'll find if they're using an older version where we haven't necessarily squashed something,

we have to

redirect them to a newer modern version. And

1 of the forks, for example,

had a bug with

losing data on backup,

and that wasn't a MariaDB product.

But they were coming to MariaDB using those backups to restore.

And that meant

that we had to think outside the box

and

create an enterprise

backup solution

that helped that customer

succeed. And I think

that kind of led into, well,

let's make an enterprise backup solution, and that's what we did.

But you don't know when someone's gonna run 2, 000 rename tables a second. You just don't. And that was the case there.

You don't know if somebody's

going to

lock 20 tables

because they thought they were locking 1 row.

It's a hard lesson, and I think understanding that and asking questions and learning,

and then continuing to make your products better and meet those use cases, I think is really cool. So that's why, like, ease of use and the developer experience is really important to us right now.

We have a developer advocate,

and

he's been helping us work through,

well, how can we make it easier to use our products? You know, we're trying to move away from manual sharding, expand. Right? We're trying to make it easier to install and configure

our databases. With SkySQL, we wanna make it so you're not even worrying about that. You're just worrying about, is my data there? Can I access it? Is it fast?

Right? Is it durable?

Is it atomic? You know, those are, I think, more important

things for developers and users than,

oh, man, I gotta take a backup. I gotta do this.

That's kinda where we're at. I guess I look at what did I hate as a DBA?

And I talk to our customers and users. I'll be like, do you hate that too? Okay. And I go talk to our PMs, and we get it done.

So

I think there's that, you know, we still listen to, like,

what the needs are, and what do they need out of a cloud? And that might not be something they can get from a hyperscaler.

Right? We have a big advantage in that

we own our products. We can make changes. We can add features or fix bugs, And that's all surfaced in SkySQL, actually. So SkySQL has been a big benefit because

more often than not, we'll see a feature need or

a bug in SkySQL

before our customers see it.

And I think over the last few years, you'll see that we've done a lot of innovation

because

we need to make our fully managed cloud

better. And that's not something necessarily the hyperscalers

can do with all the open source projects that they've kind of just lifted and

put into their product suite.

For people who are looking for the

storage system that they want to use to power their data use case, whether it's a transactional

application, just a CRUD app, or a, you know, high scale

data analytics use case or anything in between? What are the cases where MariaDB is the wrong choice?

We don't have a messaging queue.

I like Kafka. There's a lot of options out there though. We work well with Kafka. I would say

that's a hard 1. That's a really hard 1 for me to answer because, again, when I say the word fanboy again,

there's a lot of times where

I don't

necessarily

see why someone would do something.

There are definitely

reasons to use Oracle or SQL Server or other databases,

but I haven't seen them in a lot of our customers, I'll be honest.

And that's because

we have that flexibility

in our use cases and our storage.

You can't run SharePoint on MariaDB.

That's that's 1 that's difficult,

of course. I think there's a lot of Oracle products that require Oracle database.

That makes it difficult.

So you probably can't do that.

But I've had people try

to varying degrees of success.

I would say that's a hard 1. That's a really hard 1 for me.

And as you continue to work on MariaDB

and help to grow its capabilities and work with your customers to understand their requirements and use cases. What are some of the things you have planned for the near to medium term, or any problem areas that you're excited to dig into?

I'm excited that we're gonna go public soon. But with our customers

and our products, I'm excited to kinda dig into geospatial. It's not something that

I've done a lot of,

but I've recently been speaking with the the QbWorks team from our our recent acquisition, and these are guys that worked on, like, Oracle's

geospatial features, like, 30 years ago, and now they've created something, like, really cool.

It's something I'm excited to learn about, and I think, again, you can see, like, my love of learning. I think getting into that space and wanting to become an expert in that is really cool, But I'm also really excited about, like, our command line tools. I mean, it's not sexy, but,

like, making things easier for people's day to day is, like, my passion, I think.

Making it so that our users

aren't wondering why we made an engineering choice,

I think is really important.

Because

I don't want people to think

of that old thing that I'm nostalgic for. I want them to think of the future

of database, right, when they think of us. And so you do have cool things like that, geospatial and the analytics and whatnot. But I think Xpand is gonna do some really cool things, and it already is. And that's something that I've been excited about since we

brought Xpand in.

But now that we have it in SkySQL and we have these really cool customers, I think we're gonna see more of that.

Are there any other aspects of the MariaDB

project or the overall product suite or the business itself that we didn't discuss yet that you'd like to cover before we close out the show? I would say that a lot of people

want a great quality customer experience

with

experts that know what they're talking about on the first touch.

Right? They don't wanna talk to someone, a call center that tells them to restart the database.

They wanna talk to someone that's

been in it with them. Right? Someone that understands where they're coming from.

And they don't wanna spend a lot of money for that, and I think

that's where we come in.

We're able to

be what they need us to be and have the experts they need to make it work the way experts they need to make it work the way they want it to work.

Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today?

Oh, man.

I thought about that question for a bit, and I think about it daily. What is our biggest gap, and what can we do better? What can I do better?

And I think you're gonna see more of solving that on our end, but globally,

I I hate the multiple SQL languages,

and

the companies that are like,

this is something else QL.

Right?

We took the SQL standard, which has existed, I don't know, 50 years,

and we just decided we don't like that 1 word, so we're gonna do it this way.

Or,

you know, like Oracle, like, we're just gonna add all these things that nobody else can use.

Or

the 1 that really gets me, whenever I use SQL Server,

Right? Select the top 50. How do I paginate?

You know, like, I hate these little quirks, and I wish that

the standard meant more.

I think that causes a lot of people to not wanna learn SQL, and that's why you have, I don't know, ORM number

501.

Right? There's just, like,

there's that, and the dream of lift and shift doesn't really exist because of that. Right?

Like, you could ORM and abstract everything in your app,

but you can't move it easily.

And I think our legacy players,

our legacy RDBMS,

that's what

they want. Right? They want it so that you can't move to open source. You can't

make it cheaper

to run your data or your products or innovate even.

And I think that's bad for innovation,

and growth, and the web, and technology as a whole.

I think that's why I value

a lot of our open source commitment in being 1 of the only open source

enterprise databases, in air quotes,

because we're enterprise quality,

but we're still committed to that. And I think that,

plus providing the cloud things that people need, is neat. And I think

GUIs

are the future there.

I think there's still a need for strong command lines, which you don't get with the people that have moved away from SQL.

Alright. Well, thank you very much for taking the time today to join me and share the work that you and your team are doing at MariaDB

and your experience of being with the project ever since its early days. So appreciate all of the time and energy that you and everyone at MariaDB is doing to help make it easier to be able to store and work with data at varying scales and use cases. So thank you again for that, and I hope you enjoy the rest of your day. Yeah. Thank you for having me and letting me geek out. Always.

Thank you for listening. Don't forget to check out our other shows, podcast.init,

which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast,

which helps you go from idea to production with machine learning.

Visit the site at dataengineeringpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the show, please leave a review on Apple Podcasts, and just tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links