Summary
In this episode of the Data Engineering Podcast Roman Gershman, CTO and founder of Dragonfly DB, explores the development and impact of high-speed in-memory databases. Roman shares his experience creating a more efficient alternative to Redis, focusing on performance gains, scalability, and cost efficiency, while addressing limitations such as high throughput and low latency scenarios. He explains how Dragonfly DB solves operational complexities for users and delves into its technical aspects, including maintaining compatibility with Redis while innovating on memory efficiency. Roman discusses the importance of cost efficiency and operational simplicity in driving adoption and shares insights on the broader ecosystem of in-memory data stores, future directions like SSD tiering and vector search capabilities, and the lessons learned from building a new database engine.
Announcements
Parting Question
In this episode of the Data Engineering Podcast Roman Gershman, CTO and founder of Dragonfly DB, explores the development and impact of high-speed in-memory databases. Roman shares his experience creating a more efficient alternative to Redis, focusing on performance gains, scalability, and cost efficiency, while addressing limitations such as high throughput and low latency scenarios. He explains how Dragonfly DB solves operational complexities for users and delves into its technical aspects, including maintaining compatibility with Redis while innovating on memory efficiency. Roman discusses the importance of cost efficiency and operational simplicity in driving adoption and shares insights on the broader ecosystem of in-memory data stores, future directions like SSD tiering and vector search capabilities, and the lessons learned from building a new database engine.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
- Your host is Tobias Macey and today I'm interviewing Roman Gershman about building a high-speed in-memory database and the impact of the performance gains on data applications
- Introduction
- How did you get involved in the area of data management?
- Can you describe what DragonflyDB is and the story behind it?
- What is the core problem/use case that is solved by making a "faster Redis"?
- The other major player in the high performance key/value database space is Aerospike. What are the heuristics that an engineer should use to determine whether to use that vs. Dragonfly/Redis?
- Common use cases for Redis involve application caches and queueing (e.g. Celery/RQ). What are some of the other applications that you have seen Redis/Dragonfly used for, particularly in data engineering use cases?
- There is a piece of tribal wisdom that it takes 10 years for a database to iron out all of the kinks. At the same time, there have been substantial investments in commoditizing the underlying components of database engines. Can you describe how you approached the implementation of DragonflyDB to arive at a functional and reliable implementation?
- What are the architectural elements that contribute to the performance and scalability benefits of Dragonfly?
- How have the design and goals of the system changed since you first started working on it?
- For teams who migrate from Redis to Dragonfly, beyond the cost savings what are some of the ways that it changes the ways that they think about their overall system design?
- What are the most interesting, innovative, or unexpected ways that you have seen Dragonfly used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on DragonflyDB?
- When is DragonflyDB the wrong choice?
- What do you have planned for the future of DragonflyDB?
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
- DragonflyDB
- Redis
- Elasticache
- ValKey
- Aerospike
- Laravel
- Sidekiq
- Celery
- Seastar Framework
- Shared-Nothing Architecture
- io_uring
- midi-redis
- Dunning-Kruger Effect
- Rust
[00:00:11]
Tobias Macey:
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Data migrations are brutal. They drag on for months, sometimes years, burning through resources and crushing team morale. DataFold's AI powered migration agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year long migration into weeks? Visit dataengineeringpodcast.com/datafolds today for the details.
Your host is Tobias Macy, and today I'm interviewing Roman Gershman about building a high speed in memory database and the impact of the performance gains on data applications. So, Roman, can you start by introducing yourself?
[00:01:00] Roman Gershman:
Nice meeting you, Tobias. And I am Roman. I am the CTO of Dragonfly DB and also the original author of
[00:01:08] Tobias Macey:
Dragonfly project, hosted on GitHub. And do you remember how you first got started working in data?
[00:01:15] Roman Gershman:
Yeah. So I joined Google actually in 02/2007 in Israel in Haifa, and, my first data oriented project was, building Google Suggest, which is a service that, everyone uses when they type, their search queries in the search box and they get all those suggestions. And, believe it or not, it hasn't been launched before that. Nowadays, we are all used to it. But back then, it was a 20% project of someone in Google, and we, took it on ourselves to, to productionize it. And we launched it on YouTube, and the, the goal was to launch it on, Google.com. And there, actually, I had, my fair share of challenges of how you scale technology to Google scale, basically.
And you need to support high throughput scenarios with very low latency. And, of course, it was in memory, domain with lots of, preprocessing pipelines, etcetera etcetera. And there again, my initial experience with the with the scaling infrastructure.
[00:02:30] Tobias Macey:
And in terms of the Dragonfly project, can you give a bit of an overview about what it is and how it started and why you decided that it was worth putting the time and energy into
[00:02:42] Roman Gershman:
it? Yeah. Sure. So after Google, I moved to, working in a startup called Ubimo. Actually, my manager and the CTO of that startup is Oded, who is now my cofounder in Dragonfly DB. And, back then, we when we left Google, we kinda didn't know anything about infrastructure outside of this camp, of Google. And we started learning about all the building blocks and pieces that exist in open source community. And quickly, we discovered Redis and started using it, and it was incredibly useful in our infrastructure stack. But, unfortunately, it was very, painful to manage and scale. So we naively thought that we'll be able to take snapshots, and we couldn't do it just because in that, like, our use case was a high throughput.
Right, high write throughput. And, it just went out of memory. And, we tried to scale vertically because everyone in the Internet said that it's the most the fastest, data store that exists, very scalable. And then, to my surprise, I discovered that it is not possible to scale it vertically. But we kept using it, but I kinda had this, thought that maybe it is possible to make it, better, but I kept this fault to myself. And, then at some point, this startup was sold, to a company and, I moved on to another job. And I kinda, saw that there is an opportunity in the AWS, Elasticsearch team in Israel. And, I just wrote an email to the manager there, and I wrote to him that, hey. I'm, super excited about Redis, and I think I have some ideas of how to improve it. Could be would be, like, really happy to join the team. And it worked. Like, I joined the Elasticsearch team and, became a an engineer there.
And actually there, I had the chance and a privilege to learn about lots of use cases of how, people use Redis in various ways. And really, Redis is super useful. It's a Swiss army of various data structures and also how people try to use it. And, I saw that the challenges we experienced, in our startup was, were very much similar to how others, try to scale radius and fail and the same pain points. And, I just felt even more strongly that it is possible to improve, the technology behind this incredible product. And I tried to advocating this in the company, but it didn't succeed.
And, at some point, we just decided to separate, our ways. I mean, I am still very friendly, relationship with, Madeline, who is the lead developer now with Valky, and, very enjoyed. I enjoyed very much working with the team, but I decided, okay. Why not try myself to do something, better?
[00:06:13] Tobias Macey:
And you mentioned Valky, which is currently the more open alternative to Redis since they changed their licensing. And the other major player that I'm aware of in the key value database space is Aerospike, which also focuses on high throughput, high speed. And I'm wondering what you see as the core problem that is solved or the capabilities that are unlocked by having a, quote, unquote, faster Redis.
[00:06:43] Roman Gershman:
Yeah. Sure. So, first of all, disclosure, I've never run Aerospike. Of course, I heard about it. I think, Aerospike, at least how I, perceive it, is more in a persistent domain area. So there are lots of key value stores. And I think, it, mostly useful for enterprise, very huge scale workloads that have maybe a lower throughput to workload size ratio. So maybe, it can be, like, dozens of terabytes of data, but relatively low throughput use case. With Redis or Valkyrie or Dragonfly, usually people use it for its very low latency characteristics, sub millisecond latencies, even, for p 99, and the throughput there can be, relatively high.
In addition, as I said before, Redis all in memory and data stores, they have lots of different data structures, which being used by huge amount of framework. So Redis is not just Redis. It's the entire ecosystem with all these libraries. Laravel, psychic, BooleanQ, Celery. So all those libraries is the is the kinda the front end of Redis for many developers. And, they consume Redis there, those frameworks. So I think, one, this is why, this ecosystem is extremely useful, because of this, of its APIs and, the libraries that built were built upon them. And also, again, high throughput, scenarios, job queues, caching scenarios, all this, became ubiquitous with Redis.
With IrisPike, I believe it's more narrow use case of key value store and also kinda flavor of persistence around it.
[00:09:03] Tobias Macey:
You mentioned some of the common application use cases that I'm familiar with in the Redis ecosystem, that being Celery, application caching, queuing. And I also know in recent memory, one of the capabilities that was added was Redis streams to compete with the Kafka ecosystem. And I'm wondering broadly, what are some of the ways that you see Redis used beyond some of those common well known patterns, particularly in the context of data pipelines, data engineering workflows?
[00:09:38] Roman Gershman:
Yeah. So, of course, I can only speak from my experience, and I believe the use of usefulness of, Redis APIs, decreases with the API generation numbers. So the most common ones are the first one, like basic sets and gets, lists, etcetera. And then the recent APIs. Yeah, sure. People are using it like streams, for example, but it's just a minor, market share of, Redis users. I think to your question is that I wouldn't consider Redis only as a cache or a job queuing engine, even though it started like it, like this. But I see lots of use cases for using Redis or Dragonfly and Valkyrie as a non cash, volatile data store. It's not a database.
It doesn't have transactions that can be rolled back, but it's a data store that store like, it's a single source of truth could be for serving data, but, usually, it's for non business critical use cases. So, for example, we wouldn't want our bank storing its transactions in something like Redis, but it's totally fine to, use it for feature as a feature store, for example. And there are lots of use cases for it being a feature store. So in case of, you know, your infrastructure crashes, you can still maybe refill the the data store with your cold data.
Maybe it's painful and you lose availability, but it's not end the end of the world. So this, I consider this use cases as non cache use cases because, usually, this, data store is configured without eviction policy on. So, the the, you know, the classical cache use case is a is eviction policy enabled. But once it disabled, I consider this like a data store use case. And there are lots of, use cases like this. For example, with gaming companies, they can store scoreboards per team or player or whatever. Lots of use cases for, data engineering applications, GEO APIs, etcetera, etcetera.
[00:12:20] Tobias Macey:
Another interesting aspect of the work that you've undertaken with Dragonfly is the tribal wisdom that has grown up over the years that it takes ten years for a new database engine to really settle in and grow to maturity and sand off all the rough edges. Obviously, the past five to ten years has seen a massive growth in the number and variety of databases that are being introduced, and many of them are already in production contexts. I know a portion of that acceleration is due to the investment in various components that are used to compose a database of those different pieces, particularly in sort of the disaggregated big data stack, but also in terms of, like, the c star framework that I know that the folks behind CillaDB helped to introduce for make taking advantage of modern hardware capabilities, parallelism.
Wondering if you can talk to some of the ways that you approached the implementation of Dragonfly DB and the evaluation and selection of some of those underlying pieces to help accelerate your work so that you didn't have to build all the way from the storage engine, paging system, etcetera, all the way up through to the user interface to be able to get to where you are?
[00:13:38] Roman Gershman:
Yeah. So the the answer is, kinda embarrassing. I had to, invent or reimplement lots of things from scratch. But let's start from maybe from the the end. One huge assumption that I did when I designed Dragonfly is that we are not going to change the protocol and the compatibility. It was really important for for me to make it drop in replacement for Redis. So, basically, I didn't want to come up with, I don't know, HTTP protocol or change the semantics of the commands. And it was quite a challenge, I must say, because Redis wasn't designed for multi threaded engine.
So I had to, adjust, Dragonfly technology towards 250, commands that were not designed for multi threaded, scenarios. So for multi like for reusing multiple, CPUs. It was quite a challenge, But at least in terms of the product design, if you're talking about reusing components, I felt that the the most important decision that people that use Laravel Celery and the PollenQ or or running low scripts with Redis will be will still be able to run all those components, and that was a deliberate design choice. Now if you're talking about the implementation of Dragonfly, so for me, it was also it started as a challenge, basically, to myself. I didn't start with from, you know, thinking about, opening a startup.
Basically, I left, Elasticsearch team and, stayed at home, during COVID. It wasn't very nice period, of my life. Had to code a lot of lines of code, nights because, my twin daughters were just born and, had to juggle everything, and I was not working. And, yeah. But basically, I started with a very simple, kinda, challenge or milestone. Let's implement a toy back end, that can only answer on the like, a or only can handle set and get commands, and I called it mid midi regis. And, again, it was more like a learning experience for me of is it can we really do it? Like, I I wanted to learn about shared nothing architecture, how to do it. Also, a new Linux API was recently released called IO U ring, and I was excited about it. It was just released.
And I took everything as a, you know, as a learning experience, opportunity. And I spent more or less two or three weeks just coding. And and so it works. Like, I could create a or wrote a very simple back end, and it's by the way, it's still available on GitHub. It's called Midi Redis under my username. And, it reached, like, 4,000,000 TPS on a single machine on, AWS, and I was really excited about it. And I thought, okay. I have something. And, then I collaborated with, my cofounder, Oded, and we decided to push it forward. And my next milestone basically was around innovation of hash tables.
What kind of hash table can I use for Dragonfly? And how can I improve, the things, the major things that bothered me when I was using Redis? And it was, again, single thread, single thread nature of Redis and its lack of resilience when doing snapshotting. Basically, I really hated its snapshotting algorithm that was based on fork, SystemCo. I felt that it's, very unreliable. So it was my next challenge and I solved that challenge. And then I continued with blocking comments. And around April, so I started like November of twenty twenty one and around April, I felt, okay, we have something we have something workable. And, I succeeded to implement, my first, like, blocking command. I thought, okay.
Maybe we have something interesting here. What could be interesting, to other members of Redis community? So that's kinda a short version of how I started. And, sorry. Yeah. I had to reimplement most of the, building blocks myself. Maybe I could use Cstar, for that for for some of the work, but, I just felt that I need this, hands on experience to really understand what I'm doing. And I'm not saying that it's the right way to build stuff. For sure, it's not the fastest one, but that's something that worked for me.
[00:19:19] Tobias Macey:
In terms of the overall objective of Dragonfly, you mentioned running into limitations working with Redis as far as vertical scalability. You've mentioned some of the limitations in terms of performance, the throughput capabilities that you've unlocked with your earlier experimentation and what you've now built with Dragonfly. I'm curious if you can talk to what were your overall objectives in building this system and some of the ways that the design and implementation as well as the overarching goals of the project have changed from your first phases of experimentation to where you are now where you're actually building a business around this core technology?
[00:20:00] Roman Gershman:
Great question. I I don't think I have the full answer even today. So, basically, when I started, I think, also Dunning Kruger effect kicked in. So I thought I have a solution. Right? I, did this experiment, came up with a very efficient, data store that could answer cert, get, and maybe other comments very fast. And I thought, okay. I correct this piece of data structure sorry. This, piece of infrastructure. But then I discovered how complicated the the whole ecosystem is and how much effort is, to support, law scripting properly. How much, effort is to support pipelining, properly.
All those, things that I haven't thought about I have sorry. Haven't think about. And, my kinda first naive, thought was just by improving the performance of those basic operations, I will be able to win other developers right away. And, then I, saw that, again, all those fragmented use cases that, cover the entire ecosystem of Redis with all those frameworks. And, we we just had to kinda optimize memory usage, efficiency for all those use cases. And, yeah, the kinda the strategic goal was cost efficiency. That kinda the the umbrella, the high level umbrella of what we try to achieve is a cost efficiency.
After that, if we add a new command, or new API to, let's say, Dragonfly, it won't move a needle. And there is a natural inertia in the market, right, because of the frameworks. They need to pick up this, command in order to use it, but they are, right now, largely dependent on the Redis. So I felt that just by disrupting the cost factor of the current use cases, we'll be able to win over the, you know, the the market. And we're still working on it. Right? So we started with multi threading. Then we continued with, memory efficiency. We implemented sorted sets.
We improved, greatly improved snapshotting algorithm. And we started seeing how people, started switching over to Dragonfly because of these advantages. But there is long road ahead of us. So basically really huge use cases. Say enterprise scale use cases require, more sophisticated features like cross region replication, maybe, like better auto scaling support. So basically now we are going into this control plane territory where we have maybe the basic block of the this, fairly efficient backend. And now we need to build upon it those sophisticated use cases to support, enterprise customers, and that's what we do, nowadays.
[00:23:25] Tobias Macey:
As you have seen teams migrating from Redis or Valky onto Dragonfly, obviously, there are the cost and efficiency gains, and you've mentioned the scalability benefits. But beyond those, I guess, financial motivations, what are some of the ways that you've seen it change the ways that teams think about how and where to apply that Redis and key value and queue based functionality in their overall architecture or the ways that they're thinking about the role of that memory, efficient memory storage in their overall system design?
[00:24:09] Roman Gershman:
Yeah. So, actually, our first, early adopters, were not necessarily people that try to, save on costs. These were teams that were self hosting Redis clusters with, let's say, up to dozens of shards, and it was an operational nightmare for them. And they switched to single node Dragonfly. So for them, it was the operational pain that was been has been solved, and they not necessarily became our customers. So that was, mostly community users. Yeah. And, this one case another case, so we so and, again, it's around the operational complexity, I guess. We saw people optimizing, let's say, dividing their infrastructure, their, clusters into separate entities just because of the different throughput needs. So for some high throughput use cases, they needed read replicas, for example.
And for others, they could use a single single master cluster. And then and it was totally artificial division just because their original, cluster couldn't cope with their load. So they had, like, to, separate and optimize this. And again, with Dragonfly, with its vertical scale, they could just unify everything. And not only it simplified their infrastructure, like, reduce the complexity of their, infrastructure, but also reduce the hardware footprint because now they could average, their workload, and the traffic actually became less volatile just because they unified their infrastructure pieces together.
It's a similar use case, I guess. And the another, I would say, anecdotal example is that we had a customer actually, that was that is very excited about Dragonfly. One thing that they told us is that they accidentally stumbled upon an API that is, Dragonfly specific called the CL throttle, that originally came from a model that someone implemented for Redis, and we just implemented as a core Dragonfly functionality. And before that, they used, Golang library for that. It was kinda complicated and, very inefficient. And now they have this, built in API call, that simple that they could use.
And they're very happy about it because it scales very well with Dragonfly. So, basically, there is no, you know, one single answer to your question, but we hear, all the time about different advantages of using Dragonfly. Just sorry. Maybe it's too long, but I'm super excited about, every time when I hear about, people using, Dragonfly different ways, it, it's just another, morale boost. So few, months ago, we heard about, Mastodon Fox adopting, Dragonfly. And it started with them opening a bug in Dragonfly repo. So it wasn't a small sale for them. But, basically, once we fix this bug, we're super happy about Dragonfly because, again, it allowed them to reduce their hardware, footprint because of Dragonfly being super efficient in memory.
So instead of if I remember, instead of using 20 gigabytes of, RAM, they could, store the same workload with six or seven gigabytes with Dragonfly. Those are kinda the use cases. I guess the last one still falls under the cost efficiency umbrella. And the, I would say, the scaling factor is, I would say, it around 40% of the use cases when people can't scale their workload with Redis Cluster. So it's not even about cost efficiency. It's about them scaling horizontally and still seeing their charts overheating. And that's, I would say, at least, based on my experience, it's not the majority of use cases. Majority is still cost efficiency, but still, and a significant amount of use cases that come because of that as well.
[00:29:19] Tobias Macey:
Another interesting aspect of what you're doing is that it is a very memory intensive system. It's very focused on speed and efficiency. But with memory being the key resource that's required, obviously, correctness around memory usage is very important. And I'm wondering what your thoughts are on if you were to restart it today, would you still go with c plus plus, or do you think that it would be useful to at least use Rust for portions of that? I'm just wondering what your, analysis of that in terms of language choice has been as you've continued to build and evolve the system.
[00:29:54] Roman Gershman:
We had huge amount of bugs around memory semantics and multithreading. We had huge amount of bugs, around other areas that with any language wouldn't help there as well. But I would answer your question like this. If I would start today, I probably would still use c plus plus just because it's the tool that I know best. If I would start twenty years ago, being twenty years younger and Rust would be exist back then, I would probably start with Rust. Yes. I I just didn't want to waste, my time on learning. And it's not just about the new language. It's about the entire ecosystem of libraries that I didn't want to spend time, you know, learning.
But I totally get the advantages of using Rust. I'm not against Rust. I actually used it in AWS. I learned it there a bit, and I enjoyed my short time with Rust back then.
[00:31:02] Tobias Macey:
And in your work of embarking on this experimental project leading to building a new database engine and building a business around it, what are some of the most interesting or unexpected or challenging lessons that you learned on that journey?
[00:31:17] Roman Gershman:
Just about our assumptions of how people use memory stores. It's basically 50 shades of gray. And I kinda knew about it, when I was working in, the ElastiCache team. But, then when, we launched the project in the community, I actually started only then to start it to understand the complexity of the ecosystem, about our assumptions, what who would be our first, kinda early adopters. Like, our hope would was that it would be cloud users, but it was the opposite. And then, like, with, cloud users, I think Dragonfly is, is best for large scale workloads. But first, you know, we we went through the whole journey of onboarding small users, building the data store, suitable for them, optimizing even for smallest use cases, and then slowly, going towards any bigger and bigger customers.
It was, kinda maybe naive expectation on our side that it's gonna be reversed just because, the market is already there. That was our kinda thought that just because we are building drop in replacement and it's fully compatible, it will be easier to onboard bigger customers. And it didn't happen, easily. But besides that, I don't know, just a random request from, commercial users that were unrelated to, maybe, to the data plane requirements and to the technology of data of Dragonfly itself. All the mechanics of the cloud system of the, you know, automated service, all those features that we need to implement before commercial users, start working with us. It was also kinda unexpected.
Luckily, we had, we have, the best team, cloud team, engineering team, and we could solve all those, challenges very quickly.
[00:33:44] Tobias Macey:
To the point of Redis compatibility, obviously, that gave you a very focused target to aim for in terms of the implementation that helped with the adoption curve as far as people not having to reimplement any of their tech stack, their libraries. They could use the existing set of technology that they were using. They just swap out one component of it. Now that you have that in place, I'm curious what your thoughts are as far as extensions to that interface that would be useful or additional features beyond the bounds of Redis that are, in consideration for adding to or extending the capabilities and use cases of Dragonfly?
[00:34:28] Roman Gershman:
I could actually ask the same question back, to you. Let's say you'd develop a new SQL database. Do you think that there is an API command that would be disruptive in this, market or something that, let's say, would quickly move MariaDB or MySQL users to your database if you'd create it. So I I don't know what you think about it, but, my thinking is that there is no such magical command that would do any, like, quick wins in terms of adoption just because we have it and our, like, other technologies do not have it. And we are still on our path to disrupt, I would say, the the core attributes of in memory data store. And what I mean by that is that our next goal is to make Dragonfly a fusion store, Basically, something that can use SSD, local SSD, very fast NVMe drives, and to provide the same look and feel as in memory store, but reduce, this dependence on memory. Like, you you mentioned, memory before that it's very important component. Usually, it's also the biggest cost contributor when people use in memory stores, and that's something that also presents them to move huge workloads, to in memory space. They would love to benefit from low latency, high throughput, but they can't due to high memories, costs.
And the memory costs are not going down as fast at least as, you know, SSD cost. So our I don't say I wouldn't say near term, but, like, maybe midterm goal milestone is, to introduce SSD tiering that would be able to benefit from, SSDs and, offload few chunks of data from, Dragonfly and by that to reduce total cost of ownership by, you know, a factor of 10, let's say, five, it will it will still be a huge win.
[00:37:08] Tobias Macey:
For people who are using Redis or evaluating use cases that are adjacent to Redis in that ecosystem, what are the cases where Dragonfly DB is the wrong choice?
[00:37:21] Roman Gershman:
I think, when people not Dragonfly, but in memory store in general, some companies marketed, Regis, for example, as a database. As an engineer, it makes me hurt inside. Regis is not a database. Dragonfly is not a database. And some maybe naive, thought folks, think that, this can be used, or in memory store can be used as a database. I think, everything that, involves durability and, you know, strong consistency guarantees of, transactional guarantees of, all operations. For those use cases, you can't use in memory store. There is inherent trade off there that in order for a memory store to be fast, it, never records its, actions, so it can't roll back transactions in case they fail. And I think it's a great trade off for use cases that do not require, transactional guarantees.
And there is a huge market for those use cases. But people should be aware of these trade offs. What else? Besides that, I think, anything that, requires high throughput and sub millisecond latency must use in memory data store. And it's kinda an unfortunate outcome. Like, people think, okay. Local SSD, for example, let's build something that uses SSD. The thing with SSD is that it has low latency, actually. It it's really great in terms of latency, but it is limited in terms of IOPS that you can perform compared to memory. Like, several orders of magnitude lower, operations per second that you can do.
And I don't think it's gonna be solved in the near future unless, something like persistent memory will appear again. Like, inter tried this, it didn't work out. But, without it, high throughput use cases won't run anywhere else in a cost efficient manner. That kinda my general advice. Like, use in memory data store for super high throughput use cases, and do not use it if you require transactional semantics and good durability guarantees.
[00:40:03] Tobias Macey:
Are there any other aspects of the work that you're doing on Dragonfly, the overall ecosystem around Redis, the use cases for memory stores that we didn't discuss yet that you'd like to cover before we close out the show?
[00:40:15] Roman Gershman:
Yeah. Sure. We also, follow the general trend of, you know, the AI revolution, and we added support for vector search. It's still, very much, naive, I would say, but it exists. And for people who already use in memory data store and they need attribute search together with vector search in a single query, which is extremely useful, we can provide a very, good, alternative to other solutions. So, basically, there is a this debate of either vector search database that are narrow, focused on solving, this problem will survive in the long term. And the the kinda the the general, opinion is that I mean, they they don't they don't have enough reasons to run a dedicated, database only for vector search.
And, as I said, in memory data stores, very, flexible with their use cases, and usually people already run them for other needs. So here is the chance of using something like drag Dragonfly for classical use cases and also for vector search and document search, like JSON and etcetera and etcetera.
[00:41:47] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:42:04] Roman Gershman:
Interesting question. Actually, I wouldn't lie. I do not have enough insight for to answer this, just because I am on the other side of the mirror. Basically, I learn about data store, database needs from my users, from my customers. Surprisingly, I do not use lots of databases myself, ironically, maybe. So I can't answer your question, unfortunately.
[00:42:43] Tobias Macey:
Fair enough. Alright. Well, for anybody who wants to try out Dragonfly, I'll add links in the show notes. I appreciate you taking the time today to join me and share the work that you've done, your journey to building this system, and all of the effort that you're putting into improving the scalability and cost efficiency of these memory store use cases. So thank you again for that, and I hope you enjoy the rest of your day.
[00:43:08] Roman Gershman:
Thank you, Tobias, and, have a good day.
[00:43:18] Tobias Macey:
Thank you for listening, and don't forget to check out our other shows. Podcast.net covers the Python language, its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI systems. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at data engineering podcast dot com with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Data migrations are brutal. They drag on for months, sometimes years, burning through resources and crushing team morale. DataFold's AI powered migration agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year long migration into weeks? Visit dataengineeringpodcast.com/datafolds today for the details.
Your host is Tobias Macy, and today I'm interviewing Roman Gershman about building a high speed in memory database and the impact of the performance gains on data applications. So, Roman, can you start by introducing yourself?
[00:01:00] Roman Gershman:
Nice meeting you, Tobias. And I am Roman. I am the CTO of Dragonfly DB and also the original author of
[00:01:08] Tobias Macey:
Dragonfly project, hosted on GitHub. And do you remember how you first got started working in data?
[00:01:15] Roman Gershman:
Yeah. So I joined Google actually in 02/2007 in Israel in Haifa, and, my first data oriented project was, building Google Suggest, which is a service that, everyone uses when they type, their search queries in the search box and they get all those suggestions. And, believe it or not, it hasn't been launched before that. Nowadays, we are all used to it. But back then, it was a 20% project of someone in Google, and we, took it on ourselves to, to productionize it. And we launched it on YouTube, and the, the goal was to launch it on, Google.com. And there, actually, I had, my fair share of challenges of how you scale technology to Google scale, basically.
And you need to support high throughput scenarios with very low latency. And, of course, it was in memory, domain with lots of, preprocessing pipelines, etcetera etcetera. And there again, my initial experience with the with the scaling infrastructure.
[00:02:30] Tobias Macey:
And in terms of the Dragonfly project, can you give a bit of an overview about what it is and how it started and why you decided that it was worth putting the time and energy into
[00:02:42] Roman Gershman:
it? Yeah. Sure. So after Google, I moved to, working in a startup called Ubimo. Actually, my manager and the CTO of that startup is Oded, who is now my cofounder in Dragonfly DB. And, back then, we when we left Google, we kinda didn't know anything about infrastructure outside of this camp, of Google. And we started learning about all the building blocks and pieces that exist in open source community. And quickly, we discovered Redis and started using it, and it was incredibly useful in our infrastructure stack. But, unfortunately, it was very, painful to manage and scale. So we naively thought that we'll be able to take snapshots, and we couldn't do it just because in that, like, our use case was a high throughput.
Right, high write throughput. And, it just went out of memory. And, we tried to scale vertically because everyone in the Internet said that it's the most the fastest, data store that exists, very scalable. And then, to my surprise, I discovered that it is not possible to scale it vertically. But we kept using it, but I kinda had this, thought that maybe it is possible to make it, better, but I kept this fault to myself. And, then at some point, this startup was sold, to a company and, I moved on to another job. And I kinda, saw that there is an opportunity in the AWS, Elasticsearch team in Israel. And, I just wrote an email to the manager there, and I wrote to him that, hey. I'm, super excited about Redis, and I think I have some ideas of how to improve it. Could be would be, like, really happy to join the team. And it worked. Like, I joined the Elasticsearch team and, became a an engineer there.
And actually there, I had the chance and a privilege to learn about lots of use cases of how, people use Redis in various ways. And really, Redis is super useful. It's a Swiss army of various data structures and also how people try to use it. And, I saw that the challenges we experienced, in our startup was, were very much similar to how others, try to scale radius and fail and the same pain points. And, I just felt even more strongly that it is possible to improve, the technology behind this incredible product. And I tried to advocating this in the company, but it didn't succeed.
And, at some point, we just decided to separate, our ways. I mean, I am still very friendly, relationship with, Madeline, who is the lead developer now with Valky, and, very enjoyed. I enjoyed very much working with the team, but I decided, okay. Why not try myself to do something, better?
[00:06:13] Tobias Macey:
And you mentioned Valky, which is currently the more open alternative to Redis since they changed their licensing. And the other major player that I'm aware of in the key value database space is Aerospike, which also focuses on high throughput, high speed. And I'm wondering what you see as the core problem that is solved or the capabilities that are unlocked by having a, quote, unquote, faster Redis.
[00:06:43] Roman Gershman:
Yeah. Sure. So, first of all, disclosure, I've never run Aerospike. Of course, I heard about it. I think, Aerospike, at least how I, perceive it, is more in a persistent domain area. So there are lots of key value stores. And I think, it, mostly useful for enterprise, very huge scale workloads that have maybe a lower throughput to workload size ratio. So maybe, it can be, like, dozens of terabytes of data, but relatively low throughput use case. With Redis or Valkyrie or Dragonfly, usually people use it for its very low latency characteristics, sub millisecond latencies, even, for p 99, and the throughput there can be, relatively high.
In addition, as I said before, Redis all in memory and data stores, they have lots of different data structures, which being used by huge amount of framework. So Redis is not just Redis. It's the entire ecosystem with all these libraries. Laravel, psychic, BooleanQ, Celery. So all those libraries is the is the kinda the front end of Redis for many developers. And, they consume Redis there, those frameworks. So I think, one, this is why, this ecosystem is extremely useful, because of this, of its APIs and, the libraries that built were built upon them. And also, again, high throughput, scenarios, job queues, caching scenarios, all this, became ubiquitous with Redis.
With IrisPike, I believe it's more narrow use case of key value store and also kinda flavor of persistence around it.
[00:09:03] Tobias Macey:
You mentioned some of the common application use cases that I'm familiar with in the Redis ecosystem, that being Celery, application caching, queuing. And I also know in recent memory, one of the capabilities that was added was Redis streams to compete with the Kafka ecosystem. And I'm wondering broadly, what are some of the ways that you see Redis used beyond some of those common well known patterns, particularly in the context of data pipelines, data engineering workflows?
[00:09:38] Roman Gershman:
Yeah. So, of course, I can only speak from my experience, and I believe the use of usefulness of, Redis APIs, decreases with the API generation numbers. So the most common ones are the first one, like basic sets and gets, lists, etcetera. And then the recent APIs. Yeah, sure. People are using it like streams, for example, but it's just a minor, market share of, Redis users. I think to your question is that I wouldn't consider Redis only as a cache or a job queuing engine, even though it started like it, like this. But I see lots of use cases for using Redis or Dragonfly and Valkyrie as a non cash, volatile data store. It's not a database.
It doesn't have transactions that can be rolled back, but it's a data store that store like, it's a single source of truth could be for serving data, but, usually, it's for non business critical use cases. So, for example, we wouldn't want our bank storing its transactions in something like Redis, but it's totally fine to, use it for feature as a feature store, for example. And there are lots of use cases for it being a feature store. So in case of, you know, your infrastructure crashes, you can still maybe refill the the data store with your cold data.
Maybe it's painful and you lose availability, but it's not end the end of the world. So this, I consider this use cases as non cache use cases because, usually, this, data store is configured without eviction policy on. So, the the, you know, the classical cache use case is a is eviction policy enabled. But once it disabled, I consider this like a data store use case. And there are lots of, use cases like this. For example, with gaming companies, they can store scoreboards per team or player or whatever. Lots of use cases for, data engineering applications, GEO APIs, etcetera, etcetera.
[00:12:20] Tobias Macey:
Another interesting aspect of the work that you've undertaken with Dragonfly is the tribal wisdom that has grown up over the years that it takes ten years for a new database engine to really settle in and grow to maturity and sand off all the rough edges. Obviously, the past five to ten years has seen a massive growth in the number and variety of databases that are being introduced, and many of them are already in production contexts. I know a portion of that acceleration is due to the investment in various components that are used to compose a database of those different pieces, particularly in sort of the disaggregated big data stack, but also in terms of, like, the c star framework that I know that the folks behind CillaDB helped to introduce for make taking advantage of modern hardware capabilities, parallelism.
Wondering if you can talk to some of the ways that you approached the implementation of Dragonfly DB and the evaluation and selection of some of those underlying pieces to help accelerate your work so that you didn't have to build all the way from the storage engine, paging system, etcetera, all the way up through to the user interface to be able to get to where you are?
[00:13:38] Roman Gershman:
Yeah. So the the answer is, kinda embarrassing. I had to, invent or reimplement lots of things from scratch. But let's start from maybe from the the end. One huge assumption that I did when I designed Dragonfly is that we are not going to change the protocol and the compatibility. It was really important for for me to make it drop in replacement for Redis. So, basically, I didn't want to come up with, I don't know, HTTP protocol or change the semantics of the commands. And it was quite a challenge, I must say, because Redis wasn't designed for multi threaded engine.
So I had to, adjust, Dragonfly technology towards 250, commands that were not designed for multi threaded, scenarios. So for multi like for reusing multiple, CPUs. It was quite a challenge, But at least in terms of the product design, if you're talking about reusing components, I felt that the the most important decision that people that use Laravel Celery and the PollenQ or or running low scripts with Redis will be will still be able to run all those components, and that was a deliberate design choice. Now if you're talking about the implementation of Dragonfly, so for me, it was also it started as a challenge, basically, to myself. I didn't start with from, you know, thinking about, opening a startup.
Basically, I left, Elasticsearch team and, stayed at home, during COVID. It wasn't very nice period, of my life. Had to code a lot of lines of code, nights because, my twin daughters were just born and, had to juggle everything, and I was not working. And, yeah. But basically, I started with a very simple, kinda, challenge or milestone. Let's implement a toy back end, that can only answer on the like, a or only can handle set and get commands, and I called it mid midi regis. And, again, it was more like a learning experience for me of is it can we really do it? Like, I I wanted to learn about shared nothing architecture, how to do it. Also, a new Linux API was recently released called IO U ring, and I was excited about it. It was just released.
And I took everything as a, you know, as a learning experience, opportunity. And I spent more or less two or three weeks just coding. And and so it works. Like, I could create a or wrote a very simple back end, and it's by the way, it's still available on GitHub. It's called Midi Redis under my username. And, it reached, like, 4,000,000 TPS on a single machine on, AWS, and I was really excited about it. And I thought, okay. I have something. And, then I collaborated with, my cofounder, Oded, and we decided to push it forward. And my next milestone basically was around innovation of hash tables.
What kind of hash table can I use for Dragonfly? And how can I improve, the things, the major things that bothered me when I was using Redis? And it was, again, single thread, single thread nature of Redis and its lack of resilience when doing snapshotting. Basically, I really hated its snapshotting algorithm that was based on fork, SystemCo. I felt that it's, very unreliable. So it was my next challenge and I solved that challenge. And then I continued with blocking comments. And around April, so I started like November of twenty twenty one and around April, I felt, okay, we have something we have something workable. And, I succeeded to implement, my first, like, blocking command. I thought, okay.
Maybe we have something interesting here. What could be interesting, to other members of Redis community? So that's kinda a short version of how I started. And, sorry. Yeah. I had to reimplement most of the, building blocks myself. Maybe I could use Cstar, for that for for some of the work, but, I just felt that I need this, hands on experience to really understand what I'm doing. And I'm not saying that it's the right way to build stuff. For sure, it's not the fastest one, but that's something that worked for me.
[00:19:19] Tobias Macey:
In terms of the overall objective of Dragonfly, you mentioned running into limitations working with Redis as far as vertical scalability. You've mentioned some of the limitations in terms of performance, the throughput capabilities that you've unlocked with your earlier experimentation and what you've now built with Dragonfly. I'm curious if you can talk to what were your overall objectives in building this system and some of the ways that the design and implementation as well as the overarching goals of the project have changed from your first phases of experimentation to where you are now where you're actually building a business around this core technology?
[00:20:00] Roman Gershman:
Great question. I I don't think I have the full answer even today. So, basically, when I started, I think, also Dunning Kruger effect kicked in. So I thought I have a solution. Right? I, did this experiment, came up with a very efficient, data store that could answer cert, get, and maybe other comments very fast. And I thought, okay. I correct this piece of data structure sorry. This, piece of infrastructure. But then I discovered how complicated the the whole ecosystem is and how much effort is, to support, law scripting properly. How much, effort is to support pipelining, properly.
All those, things that I haven't thought about I have sorry. Haven't think about. And, my kinda first naive, thought was just by improving the performance of those basic operations, I will be able to win other developers right away. And, then I, saw that, again, all those fragmented use cases that, cover the entire ecosystem of Redis with all those frameworks. And, we we just had to kinda optimize memory usage, efficiency for all those use cases. And, yeah, the kinda the strategic goal was cost efficiency. That kinda the the umbrella, the high level umbrella of what we try to achieve is a cost efficiency.
After that, if we add a new command, or new API to, let's say, Dragonfly, it won't move a needle. And there is a natural inertia in the market, right, because of the frameworks. They need to pick up this, command in order to use it, but they are, right now, largely dependent on the Redis. So I felt that just by disrupting the cost factor of the current use cases, we'll be able to win over the, you know, the the market. And we're still working on it. Right? So we started with multi threading. Then we continued with, memory efficiency. We implemented sorted sets.
We improved, greatly improved snapshotting algorithm. And we started seeing how people, started switching over to Dragonfly because of these advantages. But there is long road ahead of us. So basically really huge use cases. Say enterprise scale use cases require, more sophisticated features like cross region replication, maybe, like better auto scaling support. So basically now we are going into this control plane territory where we have maybe the basic block of the this, fairly efficient backend. And now we need to build upon it those sophisticated use cases to support, enterprise customers, and that's what we do, nowadays.
[00:23:25] Tobias Macey:
As you have seen teams migrating from Redis or Valky onto Dragonfly, obviously, there are the cost and efficiency gains, and you've mentioned the scalability benefits. But beyond those, I guess, financial motivations, what are some of the ways that you've seen it change the ways that teams think about how and where to apply that Redis and key value and queue based functionality in their overall architecture or the ways that they're thinking about the role of that memory, efficient memory storage in their overall system design?
[00:24:09] Roman Gershman:
Yeah. So, actually, our first, early adopters, were not necessarily people that try to, save on costs. These were teams that were self hosting Redis clusters with, let's say, up to dozens of shards, and it was an operational nightmare for them. And they switched to single node Dragonfly. So for them, it was the operational pain that was been has been solved, and they not necessarily became our customers. So that was, mostly community users. Yeah. And, this one case another case, so we so and, again, it's around the operational complexity, I guess. We saw people optimizing, let's say, dividing their infrastructure, their, clusters into separate entities just because of the different throughput needs. So for some high throughput use cases, they needed read replicas, for example.
And for others, they could use a single single master cluster. And then and it was totally artificial division just because their original, cluster couldn't cope with their load. So they had, like, to, separate and optimize this. And again, with Dragonfly, with its vertical scale, they could just unify everything. And not only it simplified their infrastructure, like, reduce the complexity of their, infrastructure, but also reduce the hardware footprint because now they could average, their workload, and the traffic actually became less volatile just because they unified their infrastructure pieces together.
It's a similar use case, I guess. And the another, I would say, anecdotal example is that we had a customer actually, that was that is very excited about Dragonfly. One thing that they told us is that they accidentally stumbled upon an API that is, Dragonfly specific called the CL throttle, that originally came from a model that someone implemented for Redis, and we just implemented as a core Dragonfly functionality. And before that, they used, Golang library for that. It was kinda complicated and, very inefficient. And now they have this, built in API call, that simple that they could use.
And they're very happy about it because it scales very well with Dragonfly. So, basically, there is no, you know, one single answer to your question, but we hear, all the time about different advantages of using Dragonfly. Just sorry. Maybe it's too long, but I'm super excited about, every time when I hear about, people using, Dragonfly different ways, it, it's just another, morale boost. So few, months ago, we heard about, Mastodon Fox adopting, Dragonfly. And it started with them opening a bug in Dragonfly repo. So it wasn't a small sale for them. But, basically, once we fix this bug, we're super happy about Dragonfly because, again, it allowed them to reduce their hardware, footprint because of Dragonfly being super efficient in memory.
So instead of if I remember, instead of using 20 gigabytes of, RAM, they could, store the same workload with six or seven gigabytes with Dragonfly. Those are kinda the use cases. I guess the last one still falls under the cost efficiency umbrella. And the, I would say, the scaling factor is, I would say, it around 40% of the use cases when people can't scale their workload with Redis Cluster. So it's not even about cost efficiency. It's about them scaling horizontally and still seeing their charts overheating. And that's, I would say, at least, based on my experience, it's not the majority of use cases. Majority is still cost efficiency, but still, and a significant amount of use cases that come because of that as well.
[00:29:19] Tobias Macey:
Another interesting aspect of what you're doing is that it is a very memory intensive system. It's very focused on speed and efficiency. But with memory being the key resource that's required, obviously, correctness around memory usage is very important. And I'm wondering what your thoughts are on if you were to restart it today, would you still go with c plus plus, or do you think that it would be useful to at least use Rust for portions of that? I'm just wondering what your, analysis of that in terms of language choice has been as you've continued to build and evolve the system.
[00:29:54] Roman Gershman:
We had huge amount of bugs around memory semantics and multithreading. We had huge amount of bugs, around other areas that with any language wouldn't help there as well. But I would answer your question like this. If I would start today, I probably would still use c plus plus just because it's the tool that I know best. If I would start twenty years ago, being twenty years younger and Rust would be exist back then, I would probably start with Rust. Yes. I I just didn't want to waste, my time on learning. And it's not just about the new language. It's about the entire ecosystem of libraries that I didn't want to spend time, you know, learning.
But I totally get the advantages of using Rust. I'm not against Rust. I actually used it in AWS. I learned it there a bit, and I enjoyed my short time with Rust back then.
[00:31:02] Tobias Macey:
And in your work of embarking on this experimental project leading to building a new database engine and building a business around it, what are some of the most interesting or unexpected or challenging lessons that you learned on that journey?
[00:31:17] Roman Gershman:
Just about our assumptions of how people use memory stores. It's basically 50 shades of gray. And I kinda knew about it, when I was working in, the ElastiCache team. But, then when, we launched the project in the community, I actually started only then to start it to understand the complexity of the ecosystem, about our assumptions, what who would be our first, kinda early adopters. Like, our hope would was that it would be cloud users, but it was the opposite. And then, like, with, cloud users, I think Dragonfly is, is best for large scale workloads. But first, you know, we we went through the whole journey of onboarding small users, building the data store, suitable for them, optimizing even for smallest use cases, and then slowly, going towards any bigger and bigger customers.
It was, kinda maybe naive expectation on our side that it's gonna be reversed just because, the market is already there. That was our kinda thought that just because we are building drop in replacement and it's fully compatible, it will be easier to onboard bigger customers. And it didn't happen, easily. But besides that, I don't know, just a random request from, commercial users that were unrelated to, maybe, to the data plane requirements and to the technology of data of Dragonfly itself. All the mechanics of the cloud system of the, you know, automated service, all those features that we need to implement before commercial users, start working with us. It was also kinda unexpected.
Luckily, we had, we have, the best team, cloud team, engineering team, and we could solve all those, challenges very quickly.
[00:33:44] Tobias Macey:
To the point of Redis compatibility, obviously, that gave you a very focused target to aim for in terms of the implementation that helped with the adoption curve as far as people not having to reimplement any of their tech stack, their libraries. They could use the existing set of technology that they were using. They just swap out one component of it. Now that you have that in place, I'm curious what your thoughts are as far as extensions to that interface that would be useful or additional features beyond the bounds of Redis that are, in consideration for adding to or extending the capabilities and use cases of Dragonfly?
[00:34:28] Roman Gershman:
I could actually ask the same question back, to you. Let's say you'd develop a new SQL database. Do you think that there is an API command that would be disruptive in this, market or something that, let's say, would quickly move MariaDB or MySQL users to your database if you'd create it. So I I don't know what you think about it, but, my thinking is that there is no such magical command that would do any, like, quick wins in terms of adoption just because we have it and our, like, other technologies do not have it. And we are still on our path to disrupt, I would say, the the core attributes of in memory data store. And what I mean by that is that our next goal is to make Dragonfly a fusion store, Basically, something that can use SSD, local SSD, very fast NVMe drives, and to provide the same look and feel as in memory store, but reduce, this dependence on memory. Like, you you mentioned, memory before that it's very important component. Usually, it's also the biggest cost contributor when people use in memory stores, and that's something that also presents them to move huge workloads, to in memory space. They would love to benefit from low latency, high throughput, but they can't due to high memories, costs.
And the memory costs are not going down as fast at least as, you know, SSD cost. So our I don't say I wouldn't say near term, but, like, maybe midterm goal milestone is, to introduce SSD tiering that would be able to benefit from, SSDs and, offload few chunks of data from, Dragonfly and by that to reduce total cost of ownership by, you know, a factor of 10, let's say, five, it will it will still be a huge win.
[00:37:08] Tobias Macey:
For people who are using Redis or evaluating use cases that are adjacent to Redis in that ecosystem, what are the cases where Dragonfly DB is the wrong choice?
[00:37:21] Roman Gershman:
I think, when people not Dragonfly, but in memory store in general, some companies marketed, Regis, for example, as a database. As an engineer, it makes me hurt inside. Regis is not a database. Dragonfly is not a database. And some maybe naive, thought folks, think that, this can be used, or in memory store can be used as a database. I think, everything that, involves durability and, you know, strong consistency guarantees of, transactional guarantees of, all operations. For those use cases, you can't use in memory store. There is inherent trade off there that in order for a memory store to be fast, it, never records its, actions, so it can't roll back transactions in case they fail. And I think it's a great trade off for use cases that do not require, transactional guarantees.
And there is a huge market for those use cases. But people should be aware of these trade offs. What else? Besides that, I think, anything that, requires high throughput and sub millisecond latency must use in memory data store. And it's kinda an unfortunate outcome. Like, people think, okay. Local SSD, for example, let's build something that uses SSD. The thing with SSD is that it has low latency, actually. It it's really great in terms of latency, but it is limited in terms of IOPS that you can perform compared to memory. Like, several orders of magnitude lower, operations per second that you can do.
And I don't think it's gonna be solved in the near future unless, something like persistent memory will appear again. Like, inter tried this, it didn't work out. But, without it, high throughput use cases won't run anywhere else in a cost efficient manner. That kinda my general advice. Like, use in memory data store for super high throughput use cases, and do not use it if you require transactional semantics and good durability guarantees.
[00:40:03] Tobias Macey:
Are there any other aspects of the work that you're doing on Dragonfly, the overall ecosystem around Redis, the use cases for memory stores that we didn't discuss yet that you'd like to cover before we close out the show?
[00:40:15] Roman Gershman:
Yeah. Sure. We also, follow the general trend of, you know, the AI revolution, and we added support for vector search. It's still, very much, naive, I would say, but it exists. And for people who already use in memory data store and they need attribute search together with vector search in a single query, which is extremely useful, we can provide a very, good, alternative to other solutions. So, basically, there is a this debate of either vector search database that are narrow, focused on solving, this problem will survive in the long term. And the the kinda the the general, opinion is that I mean, they they don't they don't have enough reasons to run a dedicated, database only for vector search.
And, as I said, in memory data stores, very, flexible with their use cases, and usually people already run them for other needs. So here is the chance of using something like drag Dragonfly for classical use cases and also for vector search and document search, like JSON and etcetera and etcetera.
[00:41:47] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:42:04] Roman Gershman:
Interesting question. Actually, I wouldn't lie. I do not have enough insight for to answer this, just because I am on the other side of the mirror. Basically, I learn about data store, database needs from my users, from my customers. Surprisingly, I do not use lots of databases myself, ironically, maybe. So I can't answer your question, unfortunately.
[00:42:43] Tobias Macey:
Fair enough. Alright. Well, for anybody who wants to try out Dragonfly, I'll add links in the show notes. I appreciate you taking the time today to join me and share the work that you've done, your journey to building this system, and all of the effort that you're putting into improving the scalability and cost efficiency of these memory store use cases. So thank you again for that, and I hope you enjoy the rest of your day.
[00:43:08] Roman Gershman:
Thank you, Tobias, and, have a good day.
[00:43:18] Tobias Macey:
Thank you for listening, and don't forget to check out our other shows. Podcast.net covers the Python language, its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI systems. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at data engineering podcast dot com with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Overview
Meet Roman Gershman
The Dragonfly Project
Redis and Its Ecosystem
Building Dragonfly DB
Objectives and Evolution of Dragonfly
Impact on Teams and Use Cases
Lessons Learned and Challenges
Future Directions and Innovations
When Not to Use Dragonfly
Closing Thoughts and Contact Information