Summary
Real-time capabilities have quickly become an expectation for consumers. The complexity of providing those capabilities is still high, however, making it more difficult for small teams to compete. Meroxa was created to enable teams of all sizes to deliver real-time data applications. In this episode DeVaris Brown discusses the types of applications that are possible when teams don't have to manage the complex infrastructure necessary to support continuous data flows.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack
- Your host is Tobias Macey and today I'm interviewing DeVaris Brown about the impact of real-time data on business opportunities and risk profiles
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Meroxa is and the story behind it?
- How have the focus and goals of the platform and company evolved over the past 2 years?
- Who are the target customers for Meroxa?
- What problems are they trying to solve when they come to your platform?
- Applications powered by real-time data were the exclusive domain of large and/or sophisticated tech companies for several years due to the inherent complexities involved. What are the shifts that have made them more accessible to a wider variety of teams?
- What are some of the remaining blockers for teams who want to start using real-time data?
- With the democratization of real-time data, what are the new categories of products and applications that are being unlocked?
- How are organizations thinking about the potential value that those types of apps/services can provide?
- With data flowing constantly, there are new challenges around oversight and accuracy. How does real-time data change the risk profile for applications that are consuming it?
- What are some of the technical controls that are available for organizations that are risk-averse?
- What skills do developers need to be able to effectively design, develop, and deploy real-time data applications?
- How does this differ when talking about internal vs. consumer/end-user facing applications?
- What are the most interesting, innovative, or unexpected ways that you have seen Meroxa used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Meroxa?
- When is Meroxa the wrong choice?
- What do you have planned for the future of Meroxa?
Contact Info
- @devarispbrown on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
- Meroxa
- Kafka
- Kafka Connect
- Conduit - golang Kafka connect replacement
- Pulsar
- Redpanda
- Flink
- Beam
- Clickhouse
- Druid
- Pinot
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Rudderstack: ![Rudderstack](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/CKNV8HZ6.png) RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines. RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team. RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again. Visit [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack) to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Legacy CDPs charge you a premium to keep your data in a black box. RudderStack builds your CDP on top of your data warehouse, giving you a more secure and cost effective solution. Plus, it gives you more technical controls so you can fully unlock the power of your customer data. Visitdataengineeringpodcast.com/rudderstack today to take control of your customer data. Your host is Tobias Macy. And today, I'm interviewing DeVerus Brown about the impact of real time data on business opportunities and risk profiles. So, DeVerus, welcome back to the show. For people who haven't listened to your previous episode where you introduced Maroxa, can you just give a bit of an introduction?
[00:00:53] Unknown:
Yeah. Thank you again for having me on. I know it's been I think we talked at what at the beginning of the COVID, and it's been, wow. Time flies. Yeah. We are, we've evolved a a little bit. Still the same mission. We want to make real time data the default for any organization, but, we have changed our focus to, empowering, soft regular software engineers to utilize real time data with their, favorite programming language and existing developer workflows to saw innovate and solve problems. I think that that's really what we do. We built a a platform that allows you to ingest process, orchestrate, and stream real time data, with just regular code.
You know, kind of a big middle finger out to the the rest of the drag and drop low code, no code world where, you know, you you you kinda automate copy paste, and then you have to add another point solution on at the end of that to actually get usage out of that data. Like, you can do everything, with the code, you know, in in place and not have to rip and replace anything from a from a platform or a process standpoint. So that's Maroxa in a nutshell. And how about DaVaerys in a nutshell?
[00:02:13] Unknown:
How about just you? Yeah. Me?
[00:02:16] Unknown:
CEO, cofounder, you know, Maraksa. Yeah. That's that's pretty much it. That's it's all encompassing. And I'm originally from Chicago. Went to University of Illinois. Been out in the in the valley, in total for probably 13 years now. I was here too, moved away, then, you know, I kinda came back, when I had a company that that got acquired by another company out here. So yeah. I mean, I've been a product manager. So if you use Windows I mean, not Windows, but Microsoft Azure, Zendesk, Heroku. You know, you use the Devaris stack in some way, shape, or form. My whole life has been should say all life, but professional life has been mostly focused on empowering developers to be more productive.
And so a lot of places I've worked at, that's that's what I've done and not stopping that while I'm doing my rocks.
[00:03:12] Unknown:
You've mentioned already a bit about kind of what it is that you're building there and some of the ways that the focus and goals of the platform and company have evolved. And in terms of the target customers, you mentioned that you're now focused on software engineers, developers. I'm wondering if you can give a bit of nuance as to are there any particular kind of industries or problem areas that you are seeing a lot of adoption or that you have been kind of focusing on kind of developing towards? Or if there's a particular avenue that you have found, works well for your entree into a given company, whether it's bottom up from developers who are tasked with just make this thing work, and then they find Meroxa and say, oh, hey. This does everything for me, or is it more of a kind of top down, or do they meet in the middle? Yeah. Right now is is completely top down. Like, the
[00:04:02] Unknown:
basically, what we found is is that, you know, we do really, really great work for defense, in the highly regulated industries. So because of, you know, their requirements to be secure in the moment all the time, and so, you know, banks, insurance, health, defense, intelligence, that type of stuff, that's really where we found a a a general nexus into the, you know, center of gravity around, like, the people who utilize us and gravitate towards us. I mean, really, any large organization now, we basically go in and say, like, look. Your data team is siloed off, you know, inundated with with requests from all parts of the business and they're frankly overwhelmed. So what does it look like to turn your entire engineering organization into a data team using the Maroxa tools? And you don't have to rip and replace anything.
And, you know, and they can do it with the with the existing code. And it's like right? Like, everybody's just like, tell me more. So, you know, especially in this time when when people have been, you know, laying folks off and all of that. Right? Like, every single company is a data company, whether you like it or not. Right? And so that really is a is a gift and a curse for us because, you know, everybody uses different data in different ways. Everybody stores it in different ways. Right? Like, you know, I go into some spots, and it's just like everything's in Excel. Or, you know, you go to some place and there's, like, we have a microservice for everything. Right? And that's just I don't know, man. It's it's it's wild to see, but really all they were doing is just trying to reduce that complexity and give people the tools to to innovate with real time data. And, you know, that's, by and large has been the reason why people have been able to to, you know, build on a motive relationship with us and our platform and the things that we're doing. And, data
[00:05:55] Unknown:
that is flowing data that is flowing at us in real time, continuously, unbounded, whatever nomenclature you want to use, generally, that requires a lot of upfront technical and infrastructure investment. You know, usually, that means, okay, we're going to use Kafka, or lately, there are a few more entrants into the market for, you know, whether you wanna use Pulsar or Red Panda or what have you. And, you know, the overall ideal of I want to be able to build real time applications When it first came into the realm of possibility, it was usually only for the big tech companies of the world, you know, Netflix, Google, Facebook, etcetera, because they had all of the engineers who built all these systems from the first place. They had the investment money to be able to put into actually building these systems, and so everybody else was saying, oh, I want real time too, and would start down the path, realize that it was prohibitively complicated and expensive, and then settle for some kind of half measure.
And I'm wondering what you have seen as the overall shifts in the industry, in the technology, in the kind of level of understanding and sophistication across kind of your standard engineers that have made this a tractable problem
[00:07:04] Unknown:
and have brought real time into the realm of possibility for a possibility for a larger audience. Yeah. I mean, everybody everybody, if you look at, all of I mean, that's really, like, the reason why we exist. Right? This is that we looked at, you know, the writing was on the wall and saw that all these big companies are using real time data as their competitive advantage. Right? You mentioned Netflix. Like, could you imagine, you know, you it's late at night or it's the weekend, you bring somebody over for some Netflix until and you gotta wait 10 minutes to get a movie recommendation because somebody set up a a a data app that is, you know, doing full table scans and compiling, you know, a pre you know, doing models and recommendations and all that type of stuff, right, on your data warehouse. Like, that that wouldn't fly. Or, you know, you're coming home from from a bar late at night and you have to wait 10 minutes, you know, 5 to 10 minutes to get a recommended driver. Right? Like, real time data is customer experience today. Like, it is a prerequisite for that. And so you're right in that, you know, you look at these these these companies, Netflix got 2, 000 data engineers.
Right? Like like, I mean, that's an that's an arm they have more data engineers than some people have regular engineers. Right? Like, that's insane. So who who's gonna compete with that? Yeah. I I and that's really what why, you know, I'm not trying to turn this into a more actual commercial, but that's really why we exist is that that that infrastructure should be democratized so that people can start bringing that business value. And then the innovation and, you know, like like, the the guardrails that we provide to for people to innovate at that level, that's something that is gonna usher in a whole new set of experiences and apps that weren't possible to people because they just simply didn't have the money and resources and expertise to to make those things happen. And, like, that's the the the benefit that that that we have. Right? It's like, you can go from, you know, single person startup, you know, or somebody that's tinkering on the idea all the way up to, you know, old world legacy, ecommerce, banking, defense, you know, like, all that type of stuff. I think that that's really the, the mindset that people have to start thinking. And what you know, I'm a big proponent of is that real time data should just be the default. Alright? Real time data doesn't necessarily like having data flowing in real time. Doesn't mean event driven. Alright? But it just means that as, you know, actions are happening across your platform, your app, whatever it is, that that that activity is getting tracked somewhere in real time. And it's very granular. It's very actionable, and and you can do things with it if you want to in the moment that you need to. And that's really the you know, if if if we can get all of these companies at a at a base level to start doing that, I think I think you're gonna start to see better apps and and and and better customer experiences.
I mean, like, that that's really the goal. Right? It's it's it's just to to to do that. And, you know, for me, it's like there's a ton of these what I call intelligent insights and and analytic companies where it's like, yo, you usually gotta get the data to us and then, like, we can give you all of this. But the hard part is actually, like, getting the data structured in a format that they're, you know, kind of black box and go do the thing and provide the value at that point. Right? And it's like, we handle that first part, but also we give you the guardrails to make that second part easier, scalable, and, you know, that that type of thing. So that's that's really the the the value prop and sell that we have when we go talk to an organization.
[00:10:28] Unknown:
And beyond the baseline infrastructure of I have the pipes, I'm able to get data in, and it's able to get processed and spit back out the other end in real time. What are some of the other barriers or aspects of the kind of complexity of working with real time data that you see teams run into once they say, okay. I've got my rocks in it. It solves all my plumbing problems. Now what? Yeah. Exactly.
[00:10:51] Unknown:
It's a different access pattern. Right? Like, people are used to writing a select star from orders, you know, kinda query. Right? They aren't really, like, select star from, you know, you know, cart update or something like that. Right? Like, just the granularity of the data. Like, people are just just used to dealing with snapshots, right, in the CSV world. Right? And it's like I I I think part of the problem is is that that folks really aren't oriented for that type of world where it's super granular and it's micro versus, like, we get in a CSV dump full of just data upon data upon data upon data. Right? And I think, like, framing your mindset around the different types of data that can come in, whether and and have it be consistent.
I think that's the thing. The others the the other challenge around working with real time data is, like, order guarantees and delivery semantics and, like, you know, like, all those types of things. Right? And so we handle a lot of that, but, I mean, you know, deduping is a problem for everybody. Right? Like, you know, scaling is a problem for for most people, not us, but, you know, a lot of people. Right? And so just understanding how to deal with that those types of of patterns and architectural decisions and things, That's a, you know, it's just a new new world that we're trying to usher in. Right? Like, everybody's talking about data apps and, you know, data management, all this stuff, but they talk about it from, like, the the outskirts of it. Right? Like, you know, this data lineage. This data governance. This data observability.
This metadata management. It's like, all of that doesn't necessarily talk about the actual experience of how you're using that data, to to to provide customer experiences. Alright? Like, all that stuff is just just pure overhead. And, yeah, it can it can add things, but it's like, I just need to go from point a to point b. And you're just like, you know, let me throw a spoiler on the car. And it's like, I don't really need that right now. Like, I just need to go from here to here. Right? And so I feel like there needs to be a a a simplification of the the data landscape so people don't see this as as as too much of a herculean task.
[00:12:58] Unknown:
Now that real time data is at the point of democratization, more people are starting to embark on their own journeys of actually incorporating that into their product offerings or building whole applications around this paradigm? What are some of the new categories of the types of products and applications and experiences that you have seen being unlocked by this broader availability of the underlying technology? Yeah. I mean, I I think, 1 of the things that this unlocks is that it can
[00:13:27] Unknown:
move decision making and processing closer to the to the edge. Right? And because now I can if I can get a granular kind of bite of of activity coming in, I can action you know, do some action on that versus, alright. Well, I've got I, you know, I gotta wait every hour to get a dump of the last hour of activity, then I gotta run some analytics, and then I get some to some some tables and blah blah blah. And maybe 3, 4 hours down the line. Assume that I have everything already set up and automated. Right? Like which most people don't, but, like, you know, 3 to 4 hours down the line, then I can actually, you know, give that result or provide some value back. And it's like, yeah. No. I can actually push that back a little bit further. Right? Or I don't have to wait. I don't have to choose either or. That's the other part too. Right? Where it's like, oh, am I doing this for analysis or I'm actually being Right? And, like, the analysis part, you're looking backwards, but, you know, being proactive, like, you know, with the real time data, you're actually, you know, doing the stuff in time. So I think pushing the decision making and processing closer to the edge is really, like, 1 of the biggest benefits that that that we've seen that's kind of unlocked the next generation of, you know, kind of customer experiences and value. And that's that's really where people wanna be at. And it's it's not, yo, let's throw more infrastructure or more point products at the at the problem is, no. We already have a great foundation. Let's build upon that and start thinking about more about how we could be more applicable and relevant to our customers. Because, I mean, that's what it is. It's the name of the game. Right? It's like, you know, whether your customer's external or internal, you're trying to use the data to tell a story or provide value to whoever whoever needs it. Right? And so, you know, that that that's really the the the the benefit of of of using us and question the decision making and the ability to to to do all those different types of things closer to the to the edge and then not having to sacrifice, like, historical analysis and then proactive, you know, customer engagement.
[00:15:18] Unknown:
And with that experience of pushing more of that decision making to the edge, pushing the experience to being much lower latency that also brings the possibility of increasing the risk profile because you have a much shorter window to be able to react to any bad data, any, you know, bugs that get introduced, any errors in kind of the source systems. And I'm wondering how that impacts the kind of the appetite for risk and the overall risk profile of the applications that are being built and how that also influences the types of applications that companies are comfortable building as they first start to explore the space. I mean, that's not gonna change regardless if it's a data app or it's just a regular web app. Right? Like, you know, that that that risk profile is gonna be there. I think, actually,
[00:16:04] Unknown:
for us, because we are just playing regular code and we're end to end, you know, you can just write a unit test or a functional test. You don't have to set up a whole bunch of infrastructure to, like, do regular testing. Right? Like, it can be done locally on your machine before you actually deploy it and all that type of stuff. Right? Like, if you are have good testing practices in general for your web app and, like, all that type of stuff, like, porting that over to Morax is super easy because it's just regular code, man. You know, you can use your Datadog to to to see the output of the functions and, you know, monitor, do all those, Datadog or Splunk or whatever it is. Right? Like, it it just literally just fits into your existing workflows. So it hasn't changed. I wouldn't say, like, yeah, it hasn't changed by and large the types of apps that people are are building. I think because we're we're giving them the guardrails to experiment, they're more willing to to do some of the riskier things because, you know, they can have more repeated chances at bat. They don't have to wait 2 to 3 weeks for the data team to give give an update or blah blah blah. Like, it fits within their existing software development life cycle. Right? And, like, I think that confidence and that foundation literally gives them the ability to be more risky and to do things that that they didn't think were were possible with their existing
[00:17:17] Unknown:
kinda architecture and stacks. And for any type of more kind of potentially sensitive data applications or organizations that are in a more kind of rigorous regulatory environment, what are some of the types of technical controls that they should be thinking about either in terms of validating the source data as it's coming into the pipeline or, you know, as it's traversing the pipeline before it gets delivered, just kind of what what are the available points of mitigation or overall strategies for mitigating some of those risk profiles? Yeah. I mean, you know, we run, you know, on premise, off premise, edge,
[00:17:56] Unknown:
hybrid, like, every combination that you could think of and especially for some of these highly regulated environments. Right? Like, 1, you know, we're encrypted end to end. Alright? So, you know, as soon as soon as we ingest something, we basically do, like, PKI at scale, right, which is gets embedded in the key, you know, on the ingest. And to access that data downstream, you need to basically have the key, in order to do that. So and then it's encrypted in the end. And that's whether we're regardless of where we're deployed and how we're deployed. Right? And so the other piece of that too is is that, you know, right now, because a lot of the work that we do is in the department of defense is multiclass. It's, you know, kinda all over the place as far as, like, connectivity and things like that. So we've really had to build in a lot of order guaranteeing, a lot of, like, you know, resiliency into the platform, a lot of, you know, all of that so that way we can make sure that your data gets delivered.
Right? Like like like that. And and, know, just as a general ethos as a company. Right? You know, we got 2 jobs. Never lose data. Never expose data. So I think, you know, 1 of the things, like, as we were going through complaint and I'm not, like, saying all this stuff to, like, hackers come come and mess with us. Right? Like, you know, it's not like a open challenge because I'm sure that there's something we miss. But, you know, we get regular pen tests. We you know, all all the compliance stuff that we have to deal with. Like, we're we go overboard just to make sure that we never lose data and never expose data. I mean, so much of the so to the point where even when we're troubleshooting, we don't actually get to see the details of the record. We just see that, like, hey. This source system sent x y z over over and it's, you know, these fields, these types. We don't actually see the values. Right? Even, that's kinda 1 of the things. And, like, we give our users the the ability to to to tune their their caching or retention because, you know, underneath we use Kafka. Right? And so, you know, you have to even though it's processed and all of that, there's a log record that goes through that. Right? And so, like, we're just making sure that that we are, you know, we are doing right by our customers to make sure that, you know, those risks are mitigated as much as possible.
Because, I mean, you know, data comes from everywhere. And, you know, we do a lot of work to make sure that when you connect to something, it's gonna always say connected. And then when you're, you know, moving that data and orchestrating it, it's always gonna be in the 4th bet that's unique.
[00:20:21] Unknown:
And for developers who are building these real time applications, I'm wondering what are some of the technical capabilities, the architectural design kind of background that they need, some of the ways that this real time data introduces new architectural paradigms or new strains at the different integration points between systems and just some of the overall application and system design process that needs to be brought into
[00:20:49] Unknown:
the the process of being able to actually build these applications? Yeah. I mean, I I would say the the biggest thing is we, all we care about is, or, you know, the the skill set. I would say it could be like a junior engineer because all you really need to know is where your data's coming from, where it's going, and what format it needs to be when it gets there. Like, that's literally the access pattern for how you do that. It's like, you know, dot connect, dot process, and dot write. You're like, done. You know what I'm saying? And it's like, oh, that that that that makes a lot of sense. And and I think that, like like, we tried to map that inside of our SDK. So if you if you know how to declare a a method off of a, you know, object instant, you know, like, that type of stuff, you too now can have be a data engineer. Right? Like, that's pretty much it.
So that's really the thing. And, like, just learning the architecture underneath, like, that's our whole value prop. Like, you don't really need to know the architecture underneath, Because why? Right? Like, if if I'm in, you know, let's just say I'm in the New York Times. Right? Like, am I a distributed data company? I mean, not really. I'm a news company. So all I really want is, like like, I wanna be able to get people to create content faster and, like, those sorts of things. I don't care about setting up a, you know, ingest service and airflow orchestration and blah blah blah. Like, that's a means to an end, and it shouldn't be the thing that you're you're spending the most time and resources on. And, like, that's really the the value. So, yeah, I mean, junior engineer can as long as they know how to declare methods, they can use Maroxa,
[00:22:25] Unknown:
and nobody should really have to worry about the underlying infrastructure. And as a service provider and a platform operator focused on this real time space, I know, as you've mentioned, that you're built on top of Kafka as your kind of core backbone of the system. I'm wondering over the past couple of years as you have matured the platform, kind of adapted the capabilities, added, this interface on top to be able to simplify, or in kind of abstract way all of the internals for people. What are some of the overall kind of system design evolutions that have gone internally into Meroxa and your perspective on the level of kind of maturity and capability of the kind of streaming technologies more broadly.
[00:23:09] Unknown:
Yeah. I mean, our architecture has changed quite a bit. Right? Like, we've these are changes that we knew to expect. Right? But 1 of the biggest things that we ended up doing was that we rewrote, our our Kafka Connect, and and we rewrote that in Go into a open source project called Conduit. And, you know, there there there there there were platform reasons and, you know, just kinda functionality reasons, why we ended up doing that. So 1, Kafka Connect runs on the the the JVM, and that's just a huge resource hog. 2, the connectors are just kinda all over the place as far as quality and and and efficiency.
We found out that, like, oh, snap. If you use the red chip connector, that's like a a gigabyte of RAM. Now, you know, magnify that across, multiply that across all of our customer base. Right? And it's like, you get all of these provisioned resources and not a lot of consistent, you know, throughput. And so I was just like, why are we why are we doing that? Right? Like, why do we have all these beefy boxes just for the Kafka Connect connector, and it's just it's just not super performant. Other thing too is is that, you know, we wanted people to be able to write connectors in their favorite languages.
And so, you know, to start building this this ecosystem like Kafka, and I get why they had to, I mean, Confluent, why they had to to, you know, kinda close down the ecosystem because of business reasons. But, look, at the end of the day, right, like, if I wanna build a connector, like, I shouldn't have to go through some weird esoteric kinda, you know, license approval process. Or if I wanna update a connector based off of my use case, I shouldn't have to, you know, be, subjected to to, you know, my vendors, you know, product road map like thing. Like, it just didn't make sense. So we built our own for for that reason. And then just on the operational side, like, we started to use that on our for ourselves. So, you know, every all of our service run services run on Kubernetes, all containerized, all that type of stuff. We have custom Kubernetes controllers that that, you know, help talk to our control plane to to, you know, do all the automation and things like that. But also 1 of the interesting things that we found out is, like, on Kafka Connect just randomly creates new topics get for some of the workflows that we needed to do, right, and lights, you know, be able to manage streams and some of the work work stuff that we have down the line with, like, stateful stream processing, we need to have a little bit more control over the life cycle of an event. Alright? And, like, Kafka Connect just didn't give that to you. And so, you know, it just turns out, like, oh, yeah. This this actually works. This is better for us. And now we can scale this better. We can, you know, customize it better. And it's just a better customer experience, down the line for for data integration. And that thing can sit alone, you know, be be be a standalone thing, or you're gonna put it inside of this giant infrastructure and, like, do amazing things. So, you know, we've got it, you know, copy the net, reza, go sing you know, standalone go binary, single binary. But, like, we can it's, like, 30 megabytes. We can deploy that anywhere.
Right? Do do all types of things. Right? And so, I mean, like, it's super cool. And that's probably the biggest architectural change that we've had. I mean, pretty soon, I think, you know, either mid this year towards end of towards end of this year, we're gonna 1 0 it. You know? It's a bunch of connectors that we got for it. It actually we included a run time so you can run your existing Confid Connect connectors. But, I mean, you know, for us, it's that's really been our biggest competitive advantage. Because the other part of it is, like, we can generate high super high quality connectors very, very, very fast. Right? And, like, it's already Kafka compatible and, you know, like, all that type of stuff. So, yeah, I mean, it's, it's pretty cool. And if you were to start over today,
[00:26:56] Unknown:
brand new with the infrastructure with the ecosystem as it is now, wondering
[00:27:01] Unknown:
what are some of the other architectural primitives that you would orient around or ways that you would rethink the overall implementation of your stack. Yeah. I mean, I think I think 1 of the things that we were looking at is, like, doing the stateful stream processing. You can probably doing that a little sooner. Alright? So right now, we're stateless. Right? And it's just like, you know, you can pass this stuff through, but, like, a lot of the use cases around analytics, which is where the you know, a lot of people find the value in, need stateful student processing. So, you know, building in Beam and Flink into the into the platform. Right? Like, that's something that that we're doing now, which also to your last question, like, how's the architecture changed? But, like, yeah, we're introducing, like, you know, Flink as a as a, you know, our stateful stream process of runtime. Right? And, like, that's the you know, because of the the customer demand and use cases, all this stuff that have evolved. You know, now, Maroxa or will be, you know, once we release out to the public, again, you won't have to worry about the infrastructure. You can just do dotwindow.aggregate.join.
Right? Like, it's pretty pretty cool to do. Right? Like, you know, instead of like, oh, man. I gotta get this Flink job. I gotta do this. I gotta manage the schema. I gotta, like, all this things. I was like, no. No. No. No. No. This is just this. Alright? Need to make sure I'm checkpointing it properly. Exactly. Like, you ain't gotta worry about that. Like, we'll handle all of that for you. Right? And I think that that's something that is is super useful. Other thing I forgot to mention, because, like, we do so much cool stuff underneath the hood, It's like, we'll have a, a dot analytics thing where it's basically, like, our own, real time store.
So for people that, you know, probably are used to using, like, ClickHouse or or Druid or Pinot or something like that. Right? Like, again, you don't have to worry about it. Like, we'll pre compile the schemas for you. We'll do all of that and, make it easy for people to to to do these queries in real time as well. So I think, like, you know, and just put it behind an endpoint. Right? Let's just go to, you know, whatever your data app name is and slash query and, like, you'll be able to, you know, write SQL and things like that. Right? And so, you know, for us, it's just really making that experience easier end to end so that people can, you know, interact with the data, as it is in in real time. And, like, you know, again, it's it's for me, I always say it's kinda like the the the the phone system. Right? Like, you know, we've had, 7 digit, 10 digit phone numbers forever. The technology underneath it has changed, but the the user interface part of that is always the same. I just dial a number, and I can pick up on the other end. Alright?
And that's really kinda what we're doing underneath the hood. It's just like, oh, we'll give you additional functionality, but you don't necessarily need to know how the sausage is made. Alright? We'll give you the ability to tune it and, you know, kinda choose your own flavors. But at the end of the day, right, like, for 90% of the world,
[00:29:48] Unknown:
you know, the base offering is gonna be okay. Absolutely. And you mentioned kind of statefulness and kind of being able to do windowing functions is something that you're investing in now. Another aspect of streaming data that people will typically run into is wanting to be able to actually join data across streams as they're traversing the different pipes, as well as being able to do transactional workflows where I only want to actually commit this record to the stream if this other, you know, record is also committed at the same time. So being able to do kind of atomic operations
[00:30:22] Unknown:
on top of the streams, and I'm wondering what your level of investment has been on those types of capabilities as well. So for a stateful stream processing, like, you just named a whole bunch of stuff that were abstract in a way. Right? Like like like, at the end of the day, right, like like, the should be able to help you do a lot of that stuff. But, again, when you wanna join the stream, what what do you wanna do? Like, just just talk out that algorithm. Right? Just like, oh, I got data source a, data source b. I wanna join on a specific field or a specific key. Right? Like, oh, that sounds like a method I could do. Give me give me whatever it is, data source dot, you know, or or whatever it is, like, app dot join. And I you know, the first argument is stream 1, stream 2, and then the 3rd you know, second argument, stream 2. The third argument is the key. Right? Like and then I could just store that in a variable or a collection.
You know what I'm saying? Like like like, which 1 would you rather have? Right? And, like, that's the experience that that that we're bringing out to to to the world. And then over time as as more paradigms get introduced to underlying platforms, we'll add more functionality as we see fit. But, like, for most people, all you need is dot join or dot aggregate or, you know, something like that. Like, the maintenance and operation of that, just leave that to us so we can handle it. But, like yeah. I mean, that that that's really the the the user interface that we're thinking about, bringing out via code. You know?
[00:31:47] Unknown:
And in terms of the kind of API design, the interface, the ways that you kind of document and communicate about the problems that you're solving and kind of trying to avoid having to get too deep into the weeds about the ways that those problems are being solved, I'm curious what your overall philosophy has been, your approach for actually building out those APIs and validating them and ensuring that developers are able to intuit what they are actually doing without maybe saying, like, oh, I thought it was doing this thing, but it actually did this completely different random thing that I didn't want. Want. Yeah. Yeah. Yeah. Yeah. We don't want any, like, weirdo side effects from from that stuff. But we start I mean, this is just us being being product focused. Right? It's like
[00:32:28] Unknown:
we don't fall in love with the technology. We start with the experience and the user journey first and then work backwards, right, to and figure out, like, okay. Well, what pieces of technology or or combinations of technology can we come you know, leverage to to to drive the ideal user experience? So when I say, like, dot join or dot aggregate, right, like, that's where we start. Right? And it's like, okay. Well, when we do that, what is what you know, we do customer interviews and things like that, and we just start thinking about, like, okay. Well, what are the the expectations of this output? Or what are the expectations of this, you know, kind of the high level API. And then once we get get that in, and then that's when we kinda, you know, go down the the go through the paces of, okay. Well, this is, you know, how we how we how we surface that view. And I think that that's really the the approach that, you know, if you're focused on the product and the experience, it's it's easier because people aren't necessarily, like, you know, you build the thing first, then you go back and refactor and make it more performant, more secure, blah blah blah blah. Right? Like, that's the thing that, you know, with everything that we do, we just try to be mindful of that.
[00:33:36] Unknown:
And in terms of the ways that people are employing maroxa, the types of applications that they're building, what are some of the most interesting or innovative or unexpected ways that you've seen it used? Oh, man.
[00:33:48] Unknown:
A lot of stuff I can't talk about because we do a lot of lot of stuff in defense, but, I mean, look, man. We've we've I wouldn't say a lot of it is super interesting because but it's it's interesting because of, like, from a use case perspective, there's a lot of data migration. There's a lot of, like, transformations and, you know, going from legacy to to to kinda modern things. But I would say it's interesting because of the people who use it. Right? And, like, the things that they're trying to solve is interesting to them. That's all I care about. Right? Long as it's interesting to you, you know, I real time search indexing, is that gonna get people like, oh, man. I I I man, my my inventory on my ecommerce website is always up to date.
On people's Like, you know what I'm saying? Like like, that type of thing. But 1 of the fun things I was you know, I had we had a a a some people, you know, kinda building apps. Right? And, like, no. You just start to see, like, people are, you know, doing interesting stock watching and, you know, building models there in the financial services. Right? And, like, the ability for people to to to kinda, do active learning instead of, like, using, you know, you know, some of these, like, GPT things or generative AI things. Right? Like, some of them having to use billions of data points, they can, like, build their own, you know, kinda data driven or data focused, data centric models and stuff like that. Right? And, like, the applicability on that and for, like, you know, risk mitigation and and and simulations and all that type of stuff. It's actually kinda cool to see, but it's like, you know, that's in the Fintech world. That's what they do. Right? Like, not really interesting to me, but long as you like it and you pay me every month to be able to do that, hey. I love it, man. So that's just kinda 1 of the things.
[00:35:41] Unknown:
And in your experience of building MiraXa, building out this product, exploring this problem space, what are some of the most interesting or unexpected or challenging lessons that you've learned personally?
[00:35:51] Unknown:
Oh, man. You know, it's it's 1 of those things where where we're inter inter you know, we stand on the shoulders of giants. Right? And what I mean by that is we're system engineer first, software engineer second, which means that, you know, we'll we'll curate existing components and try to put them together. And if it doesn't necessarily work, then we'll build something. And so a lot of these, like, very, very, very, very popular products in open source world, they're just really bad developer experience. Right? Like like, this you know? Or or or, like, people don't necessarily think about scaling and, you know, kind of those things. Right? And it's like, we've had to, you know, make some changes and upstream changes to to to repos and things like that because of, you know, people not understanding how these things can be used in the use cases. Yeah. Which is, again, this is a generic platform. It's an open source thing. Like, people have their their windows and lenses and into all these things. And so it's, it's just like, you know, sometimes we'll we'll get into things. It's like, yo, there has to be somebody that's complaining about this stuff because this legit does not make sense. Right? Like like, those types of things. So we find ourselves running into that, like, quite a bit. And and, you know, so, you know, for better or worse, like, you know, you'll read a a project and be like, oh, this sounds great. I can use it. And then once you start digging into the nitty gritty, it's like, oh, man. Like, yeah, this is there's some some weirdo kinda side effects or, you know, negative negative interactions that you might have when you start integrating those things. So, yeah, I I think I think from a technical aspect, that's 1 of the things that that we've learned.
I mean, you know, just just in general. Right? Like, talking to people and understanding how how important data is to them on a day to day basis and how shitty their experiences have been. Like, that's super surprising. Like, you're okay waiting 3 months to get a result back for, like, mission critical stuff. Other than the super surprises, like, the of legacy stuff that's out there, I'm like, yo. Whoever sold IBM DB 2, like, that person is probably in, like, the salesperson all hall of fame at some point. Like, there's so much IBM DB 2 out there. And I was just like, yo.
Why is this a thing? How was this a thing? Like, you know, it's just like, wow, man. This is, you know, the eighties called and they want their database back. So, you know, it's just it's just some of that stuff where it's just like it just surprises you. Like, y'all y'all still are using it. Like, all I hear in my day to day is, like, Snowflake is everywhere. Databridge is everywhere. Then when you go start talking to these enterprises, they're like, we literally just got on Hadoop. So now you're telling me there's this new thing that I gotta worry about? I hope, you know, pump your brakes kinda thing. So I think I think that's, that's 1 of the the things that that is the most surprising for us. Some of the things that are the most surprising for us. Absolutely. Especially on the legacy point. As somebody who, in my career, has been in in the process of actually installing a brand new AS 400. That's a thing? That's a thing.
Wait. Wait. Wait. You said brand new install of it. Like, on what machine?
[00:38:55] Unknown:
Right? Like No. An act a s 400. Like, that is the machine. They they carted it on a pallet, and they say, here you go. Like, they still manufacture those? Or was it I mean, as of about 10, 15 years ago. Yeah. Wild.
[00:39:08] Unknown:
That's why I'm like, you said brand new at AS 400. They're like, that is
[00:39:12] Unknown:
that's tripping me out right now, man. Okay. Yep. Yeah. Yeah. I had somebody on my team who was the dedicated RPG developer.
[00:39:20] Unknown:
I will say this, I've ridden more COBOL and and and Fortran in the, you know, probably the last, like, year than I've ever had to.
[00:39:30] Unknown:
Yep. These things never die. They just go dormant. Ever. Die. And so for people who are exploring the space of real time, what are the cases where maroxa is the wrong choice?
[00:39:41] Unknown:
There is no. No. No. I would say, like, until we get the, like, the stream process and stuff, like, fully baked, you know, there there there's some, like, you know, weird ergonomics around, like, that type of stuff, and you just you know, you wouldn't use it for stateful stream processing, where you need to do joins and aggregations and, you know, kind of window things and blah blah blah. Like, that's not something that that that's the best experience for us right now. Everything else, fair game. So, you know, there's a lot of cool things that people are doing and and and, you know, find a utility on our platform, right, in a in a myriad of ways. So
[00:40:20] Unknown:
And as you continue to build and iterate on the product, what are some of the things you have planned for the near to medium term of maroxa or any particular problem areas that you're excited to dig into? Yeah. Definitely. So, obviously, you keep hearing me say stateful state processing,
[00:40:35] Unknown:
adding support for for, you know, some additional languages like, c sharp and Kotlin and maybe even Rust. You know? You know, Atom's support for more connectors, you know, interoperability with or streaming systems like, you know, Pulsar and Red Panda and, you know, some of those things. Right? I also think that, we're getting into the generative space. So much like everybody else in their mama right now is is, you know, has some, like, dope kinda coding experiences driven by generative AI. We will play in that space on multiple levels as well, including, like, in our dashboard when you'll be able to ask questions about the data, you know, as it moves in real time, build the apps in real time. And I think, like, you know, we'll have a a pretty cool, experience there. And then, you know, eventually, like, start, you know, moving away from just the developer persona and moving into to different personas. And and, you know, do we build our own REPL kinda workbook based flow for, for data scientists and analysts? You know, what does that look like for streaming data? You know? That's not something that we people have had. Alright? Like, what does data cataloging and data governance look like for streams? Right? Like, you got it for static data, but what about data in motion? Right? Like, what does that look like? You know, some of those things are near term that that we're thinking about. And, you know, I think I think that, again, it's it's really about building the best experience for for for engineers and and people that are gonna use data to to solve problems, answer questions, and and innovate for customers.
[00:42:16] Unknown:
Are there any other aspects of the overall space of building real time data applications and the work that you're doing at Meroxa to support it that we didn't discuss yet that you'd like to cover before we close out the show? No, man. I think I think we we talked about a lot of it. Everybody needs to start thinking real time first. Real time first. That's it. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tool on our technology that's available for data management today. Oh,
[00:42:49] Unknown:
it's just so it's it's it's just so fragmented, man. Like like, there's literally a point solution for everything. And I I'm I'm hoping that at some point that there's gonna be consolidation and then consolidation around, like like, jobs, you know, jobs to be done. Right? But, like, man, there's just so much so much just I was talking to somebody this other day. It's like, I don't know if these companies are actually doing well or they're just posting content every day. Right? Like, but, you know, they're posting content every day and it's making people feel like, oh, I need this thing. But, like, when you get into, like, a large organization, you don't really need those things. Right? And it's just like that's the thing where I'm just kinda like, we've just created, you know, this demand, and it's just kinda like you know? I don't know, man. It's it's just that fragmentation is just killing us right now. So I don't think that there is a gap. I think there's honestly too much. Right? Like like, maybe the gap is understanding, like, interoperability between those systems. Right? And how do we make that better?
How do we reduce the the the kinda, operational and decision making overhead that people have to make when traversing each of those systems. Right? Because the more things you add on to it, it gets, you know, infinitely more complex. So, yeah, man. That those are the kind of things that that I I just think, like, if we all thought about this in a much better fashion,
[00:44:12] Unknown:
and there's some consolidation, like, the state of things for customer experience and providing that value can get a lot better. Well, thank you very much for taking the time today to join me and share the work that you've been doing at MiraXa to help support real time application developers. Definitely a very interesting problem space, and it's great to see that evolve and mature. So thank you for all the time and work that you and your team are putting into that, and I hope you enjoy the rest of your day. I appreciate that, man. Thank you very much for having me
[00:44:39] Unknown:
on.
[00:44:45] Unknown:
Thank you for listening. Don't forget to check out our other shows, podcast.init, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast dotcom with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Guest Welcome
Maroxa's Evolution and Mission
Target Customers and Industry Focus
Real-Time Data Infrastructure
Challenges and Benefits of Real-Time Data
New Applications Enabled by Real-Time Data
Risk Profiles and Technical Controls
Developer Capabilities and Architectural Design
System Design and Evolution of Maroxa
Innovative Uses of Maroxa
Lessons Learned and Legacy Systems
When Maroxa is Not the Right Choice
Future Plans and Enhancements
Final Thoughts and Closing