Summary
There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. In this episode he shares that journey and the combination of technical and organizational challenges that he encountered in the process.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Are you tired of dealing with the headache that is the 'Modern Data Stack'? We feel your pain. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it—it’s all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to do its thing. But don't worry, there is a better way. TimeXtender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. If you're fed up with the 'Modern Data Stack', give TimeXtender a try. Head over to dataengineeringpodcast.com/timextender where you can do two things: watch us build a data estate in 15 minutes and start for free today.
- Your host is Tobias Macey and today I'm interviewing Jean-Georges Perrin about his work at PayPal to implement a data mesh and the role of data contracts in making it work
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by describing the goals and scope of your work at PayPal to implement a data mesh?
- What are the core problems that you were addressing with this project?
- Is a data mesh ever "done"?
- What was your experience engaging at the organizational level to identify the granularity and ownership of the data products that were needed in the initial iteration?
- What was the impact of leading multiple teams on the design of how to implement communication/contracts throughout the mesh?
- What are the technical systems that you are relying on to power the different data domains?
- What is your philosophy on enforcing uniformity in technical systems vs. relying on interface definitions as the unit of consistency?
- What are the biggest challenges (technical and procedural) that you have encountered during your implementation?
- How are you managing visibility/auditability across the different data domains? (e.g. observability, data quality, etc.)
- What are the most interesting, innovative, or unexpected ways that you have seen PayPal's data mesh used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on data mesh?
- When is a data mesh the wrong choice?
- What do you have planned for the future of your data mesh at PayPal?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
- Data Mesh
- O'Reilly Book (affiliate link)
- The next generation of Data Platforms is the Data Mesh
- PayPal
- Conway's Law
- Data Mesh For All Ages - US, Data Mesh For All Ages - UK
- Data Mesh Radio
- Data Mesh Community
- Data Mesh In Action
- Great Expectations
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- TimeXtender: ![TimeXtender Logo](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/35MYWp0I.png) TimeXtender is a holistic, metadata-driven solution for data integration, optimized for agility. TimeXtender provides all the features you need to build a future-proof infrastructure for ingesting, transforming, modelling, and delivering clean, reliable data in the fastest, most efficient way possible. You can't optimize for everything all at once. That's why we take a holistic approach to data integration that optimises for agility instead of fragmentation. By unifying each layer of the data stack, TimeXtender empowers you to build data solutions 10x faster while reducing costs by 70%-80%. We do this for one simple reason: because time matters. Go to [dataengineeringpodcast.com/timextender](https://www.dataengineeringpodcast.com/timextender) today to get started for free!
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Are you tired of dealing with the headache that is the modern data stack? It's supposed to make building smarter, faster, and more flexible data infrastructure a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it, it's all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to work properly. But don't worry, there is a better way. Time extender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, Time extender helps you build data solutions up to 10 times faster and saves you 70 to 80% on costs.
If you're fed up with the modern data stack, give Time Extender a try. Head over to data engineering podcast .com/timeextender, where you can do 2 things. Watch them build a data estate in 15 minutes and start for free today. Your host is Tobias Macy, and today I'm interviewing Jean Georges Perrin about his work at PayPal to implement a data mesh and the role of data contracts in making it work. So, Jean Georges, can you start by introducing yourself? Hey. Thank you for having me, Tobias.
[00:01:23] Unknown:
I'm leading a a a few teams at PayPal, right now for building this new generation of data platform, which is happens to be a data mesh. And prior to that, I've been a I've been a consultant for a long time, and an entrepreneur when I was in France before I moved to the to the US. And I've been more or less designing and implementing data platform for, like, 15 years. So I think I think the data mesh is kind of the exciting 1 there now. So I'm really, really happy to be involved with with Datapable.
[00:01:59] Unknown:
Absolutely. And do you remember how you first got started working in data? So that's the first time
[00:02:04] Unknown:
I started very clearly working in data. It was when I was in college, and I was learning about a database, which I will not mention the name, but it's it's it's about predicting the future kind of a a a name. And I was asking my professor at that time, really, what is the purpose of all these things? Okay? And why do we have to learn SQL? And, because I don't see anyone using SQL for their own business needs. And he said, yeah. But and the admins can do that. And and, basically, it was it was kind of projecting what would be, almost a citizen data scientist, and that was the early early nineties, I would say. So but it was oh 0, I came to data, and I hated it. To be honest with you, I hated it for, and I was resisting it, resisting, re resisting it. And you know what karma is. Right? It brings you back to data.
And, I've I've worked for a fantastic company for called 4 j early in my career. They made me love data. And I think, I I still try somewhat to not being associated to data engineering. I just I I like to think that I'm a data and software engineer, architect, or whatever, but I'm not opposing them. So and I think also that's also 1 great thing with data mesh is that a lot of of the software engineering principles that we've been using for, like, the, like, 20 years are finally, like, agile, like, lifecycle, etcetera, are finally hitting the the data world. Okay. I may be a bit crude when I'm saying that, but the thing is, and I think it's great. And that Datamish is kind of structuring that, altogether.
Okay.
[00:03:47] Unknown:
And so that brings us to where you are now, where you are working at PayPal. You're implementing a data mesh. Obviously, data mesh has gotten to the point where it's starting to mean different things to different people. So, maybe we can just start by talking about what was the motivation for PayPal to decide we want to implement data mesh? Did they even use those terms? And what were the overarching goals and scope of work involved in actually undertaking this project?
[00:04:16] Unknown:
They actually did not. Okay. So I joined PayPal in December of 2021. And the idea there was and I'm in PayPal, you know, it's we are a big company, a little less than 30, 000 people. So I'm attached to fraud and risk and and credit. So not the technology part, not when you're do going to paypal.com and you do your transaction. That's that's that's not us. I'm I'm more like making sure that your account is safe, that it's that the PayPal assets are safe. And so we need a lot of data, of course, to make sure that we we are monitoring all that. So that's that's that's where I am. And we use a lot of data, as you can imagine. PayPal PayPal is probably the definition of a data driven company. But the data is is not as good as we want it right now, for performing all this all this all this check and balances.
So we needed to do something else, which was not called a data mesh at that time. So when I came in, I got a lot of a list of requirements that, hey. You've got to build this data platform. We are going to specifically address those topics and go go go with it. And, I came back with a vision, in implementing the strategy about the data platform itself. It was a platform we wanted to build for our data scientists, and that was back in January of 2022. And in March, we kind of realized that, hey, what we're building is really aligned to what Zhamak Degany was describing in her book, called a data mesh. And it was I think it's still a very new concept, but I I we never connected the dots. Okay? We didn't say, oh, what we're building is a data mesh until March of of of 2022.
It was precisely on Pi Day of March, of, 2022. So then it made our life a lot easier. Okay? So the the thing is we oh, we we realized we were actually building a data mesh or on the way to build a data mesh. And we said, well, if we follow what Jean Marc is writing in her book and her talks, etcetera, it's going to simplify our life in a greater way. And that's when we decided, okay, and it was not like it was not like a 90 degree shift or a 180 degree shift. It was really realigning our vision, to to to the 1 of the data mesh. And we gained a lot, by by by doing that. So it was not it was really not, let's say, good good winning leadership at PayPal and say, hey. We want to build a data mesh. Right? We needed the platform. The data mesh happened to be really the paradigm we wanted to follow. And I think I think it worked out pretty well.
[00:07:13] Unknown:
I I don't remember if it was on LinkedIn or on your website or where it was, but I I saw you say something along the lines of, you know, I built a data mesh over the past the over the course of the past year, and that opens the question of, is the data mesh ever done? Like, what what what does it mean to have completed an implementation of a data mesh?
[00:07:32] Unknown:
Well, it it it means that if I run away right now, they they can still they can still use it. Okay. I don't want to give anyone bad ideas, but, but it's a product. Okay? So I think I think 1 of the principle of data mesh is is, you know, the the data product, and it's bringing all the benefits of what a product is in data. And what we've been building with the data mesh as a platform is a product. So we keep adding features features to it based on what the users want, what we wanted to build in, what compliance wants to us to to do more. So it's it's, yeah, you said it's a never ending story, right? Why does it keep why does Microsoft keep selling new features in Word and Excel? Okay. It's still the same product, but we keep adding new features.
And so far, we've been lucky that our users are asking for more features.
[00:08:26] Unknown:
Absolutely. And another interesting aspect of the overall principle of data mesh is, 1, you know, is there a smallest unit beyond which should starts to make sense to build the data mesh? So, you know, if if you're, you know, a small data team, you're just building a kind of single point solution, is that just the first node in the mesh, or is that is that premature? And for a company at the scale of PayPal, if you're building a data mesh within a particular, you know, domain or bounded context or set of bounded contexts, does that then start to become kind of the root of an infection that spreads throughout the entire company where all data systems end up becoming nodes in the data mesh? Like, what what are the, ways that you think about how to define the the containing boundaries of when the mesh ends and the rest of the other platforms begin. Where's my lawyer when I need him?
[00:09:18] Unknown:
Okay.
[00:09:20] Unknown:
And so, another interesting element of the data mesh is, you know, is there a certain minimum granularity for there to be a data mesh? Like, do you have to be beyond a certain size for it to make sense to even think of those terms? And for a company the size of PayPal, where you have so much data, so many different problem domains, you know, is your implementation of a of a data mesh within the space that you're focused on just the, you know, root of the infection that is then going to spread throughout the entire company and take over all of the data domains? It's really a great question. So as you know, the the data mesh is 1 of 1 other principle of the data mesh is the the principle of domain ownership.
[00:10:00] Unknown:
So we don't always agree whether at PayPal or inside PayPal or outside PayPal on the definition of what a domain is. What we decided is that the smallest element, which is a data product but the implementation is a data quantum, so the smallest element of the quantum needs to deliver value. Okay so that's that's our goal. When we deliver value then we can build we can build them. And then later they can be meshed. So now, is it, we started in our business unit, more people, even outside of our business unit, are interested in that, to a point where our enterprise systems are also interested in what we've been doing. So whether it's going to be our data mesh that is going to become an enterprise product or there's a different strategy, we'll see. But we at PayPal, we couldn't do that without their support. Okay. So we've got enterprise level, data governance, data platforms, and they've been really great allies of us to be able to build that. Okay? So so, is it an infection and everybody's going to consume it?
Or consume from it and produce to it? Personally, if you ask me, I I think I'd love it. I think I think the benefits are really overwhelming and and beats the the the the the the problems, but I I can't tell you that for now. And it's really honest. It's not like I know and I won't tell you. It's I I can't I can't tell you. But so far, the reception of the data mesh has been pretty great, and people that have been using it, have seen it, have seen demos of it, are really loving it. They keep asking for more data in it. So so so yes, we need more data products. And it it doesn't mean that when you're starting using our data mesh, you cannot mix it with data which are not in the mesh. It's it's always possible, but people want to have the same flexibility, for for their other product.
[00:12:01] Unknown:
As you were going through the process of saying, okay. We need to build a data platform to solve this problem and then coming to the realization of, hey. What we're actually building really looks a lot like this concept of the data mesh. I'm curious what your experience was working at the organizational level to figure out particularly, once you said, oh, this is a data mesh. We need to figure out what are those bounded contexts, how you approach the business owners within the problem space that you were addressing to say, what is the appropriate level of granularity for these different data products? Who is going to own these data products? You know, who are the people who need to be involved in the decision making around what data is produced, what it looks like, how it, you know, how it is generated, how it is consumed, and just that kind of people and organizational and process question about how to actually go about building the technical bits.
[00:12:49] Unknown:
The way the way we approached this issue was we didn't we couldn't go to a producer and say, hey. Congratulations. You promoted your now you're a data product owner. Okay? I wish it would be that something something that simple. So we have, we have 1 of our team lead, a peer of mine, who's really done an awesome job at being kind of, I would say, a data product manager, okay, and combining the role of a data product owner. And he's been man it's been managing the data engineering teams that we're that are actually currently building all the data the data quanta we we are we are having. Yeah. So so that's how we we almost you know, the idea of the data mesh is to decentralize.
We almost went back to a little bit of centralization, with my friend there. But because we want to capture his experience and we want to be able to make sure that this is something we can reproduce and build a reproducible process. Okay. So we are I think we're still in the experimentation phase. I hope we're going to, at some point yeah. I think it's always going to be an experimental, experimentation phase, to be honest. But I I think that capturing that to build to to build a process is is where we are right now. And and I think it's it's a great combination of talents. He's got a great team. He's a great person, and we we just keep exchanging on building from there. Another
[00:14:21] Unknown:
element of what you're building at PayPal is, in your current role, you're managing across multiple different teams. And 1 of the interesting aspects of software anywhere is the, I forget which principle it's called, but the the fact that whatever systems you build are going to mirror the communication patterns of the organization. And so because you are federating this work across multiple different teams, how did that end up translating into the actual implementation details and the communication patterns between these different data products and data domains and some of the ways that you were consciously thinking about that aspect of this work in this problem space and maybe some of the ways that it incidentally crept into the final product.
[00:15:02] Unknown:
So so at at the heat of the project, last summer, and fall, I had 5 teams working, on the project. Okay? So I had 2 2 teams working on the platform. I had 1 team that is working on keeping the lights on in terms of data engineering of buildings that is a different data the data quantum. And I had another team which was very tactical in terms of remediation, building tools, etcetera. So, and the 5th team was Architecture. I had a small team of architects. So this was the design of the teams. It was, I don't know if you were referring to Conway's Law, but it's, we were not reflecting the business within PayPal. That was given to a product organization.
And I was mentioning my friend, I think we should name him Guna. So Guna was actually running the from a product perspective, what we were doing. Okay? Like a really like a product organization in a software organization. And then that allowed us to capture the expertise from the different people in the business and build them into our products. So right now we've got 6 data products. We will probably have, by the time this airs, another 40. Okay. So so it goes much faster at at releasing them now. And that's what that's also why we changed a little bit the organization where now all the data engineering teams that are working on building those data quanta are now are now re reporting to Guna. So it makes them more efficient and more aligned to the business unit. And they directly work with the different business units to capture the essence, I would say, of the value of the data, of the definition of the data, of the governance of the data, to put them in data contracts, which makes it a very rich set of information that we that we can actually capture and feed governance
[00:17:08] Unknown:
at various levels of the company. The question of data contracts is definitely 1 that I wanna dig into. And just as a kind of intro to this topic, it's definitely something that has gained a lot of attention in recent years, even in just recent months. And it's also 1 that, just as with data mesh, has started to be has started to mean different things to different people. And so I'm wondering if you can talk through some of the ways that you think about the purpose of contracts in this data ecosystem and some of the ways that they're implemented. Where is it just a a verbal contract of, hey. I'd really like it if you didn't send me broken data, or is it a, you know, technically enforced contract of these are the specific fields that need to happen? You know, these values need to be within this range. This value needs to be monotonically increasing. Like, just wondering what level of detail and, rigidity you are adopting for your kind of contractual approach to these domains.
[00:18:02] Unknown:
So there's a bit of all that. Okay. And and and we're we're trying we're we're trying really hard to to make it comprehensible by by everybody. So the first thing is the the data contract is not new at PayPal. Our enterprise data governance team had data contracts. They had a little bit more of a, I would say, word and, less structured unstructured version of deploying a contract. Okay? Like, more like a word document or excel document in some way. This is what governs our data. What we changed is to make it to make sure that the data contract add all this information that our data governance needed.
And in addition to that, add as well the, to make it to make it computer readable. Okay so so basically it became it became a yaml file. And this yaml file, the data contract, is used between the producer and its consumers as and and the term contract is a little bit wrong in a way because it's all the only guarantee is a producer. Right? The the producer gives the contract to the consumers. If the consumers don't want it, well, too bad. This is it's so I was I was almost thinking at some point of changing it to brochure. Okay. So it's it's more like a a marketing brochure of of your data product. Okay. You're building products. You've got a brochure, and it contains all the information you've got with the technical specs, with with the SLAs, with with the data quality, etcetera. But it's not a contract in the ways that, hey. So the user can so so the other signing party has anything to say. Okay? So it's a bit like when you're looking at the SLA on, I don't know, AWS. Okay. If you're not happy with them, too bad.
So so so in that way. But but we but we still use the term of data contract. I think everybody kind of figure figure out the the concept of a contract. A lot of time we sign contracts. We don't have a choice to sign the contract. That's that's that's how it is. Right? So so it's strong. It's it's binding. It defines the schema we which we're giving. Okay? Which a producer is giving to the consumer. But it defines as well a lot of documentation inside. Okay? So the idea is not to make a a dump of a of a of a schema and give it to the user and say, hey. That's it. But it's also about the richness of the documentation, the richness of the definition of the fields, the links to authoritative services that define, for example, when you've got a data transformation, it adheres to someone in the company saying, hey. This is how it should be transformed, and that's a link to it. It adds things like ownership with the owner, with the architect, with the different stakeholders.
Right. Right. You need to have to be able to use it. And you can imagine that data security at PayPal is something that is kind of sensitive. So that's also reflecting in the contract as well. So it's a very rich document. I can tell you that the data engineers, once they've got to fill it and to build it, okay, I became their worst enemy. We're working on making it easier. But it's tough, yeah. It's a tough document and very binding.
[00:21:24] Unknown:
Yeah. It's definitely interesting, your point too, about the question of, you know, as a contract, is it a push based contract or a pull based contract? Or who who is the kind of party in power in that situation where it sounds like the person who is actually producing the data is the person in power where it's, you know, you don't like it. That's too bad. This is what we're providing, and you just need to make sure that you're able to consume this. But even that is valuable because too many times, there is no explicit contract about what you're producing and how you're supposed to work with it. And so it becomes the responsibility of the consuming party no matter what to do all the extra work of figuring out what are the implicit guarantees, and are they even guaranteed at any level? Exactly.
[00:22:05] Unknown:
I I think, you know, it's I wouldn't say it's better than nothing. It's definitely better than nothing. Is it it's also a guarantee for for the user based on his use case, okay, when you're thinking about it. Okay. I'm consuming your data. I need to be running some regression over 10 years of data, but your contract says that your your retention period is only 3 years. And at least I know that the retention period is 3 years. Before, I just guessed, okay, that that it was that. Making a select of the oldest 1, okay, and and and figuring it out. So so this that's that's now explicit.
Okay? And that's and that's binding for both for both parties, when you guarantee some kind of freshness of the data. Right? I want data that has not been produced, like, 10 days ago, but, like, 3 hours ago in in my system. Okay? So that's also described there. And it gives a I think for the consumer, they were looking for that kind of information. Okay? So so every time our data engineering team was delivering a dataset prior to the data mesh, we were documenting it in a very standard way, in a very professional way. But it was still it was still a document. Okay? It was still so there was there was no there was some versioning possible, etcetera, but it was not linked to it was not directly linked to the datasets. Okay? Right now, the data mesh, it's it's it's welded. Okay? So data contract is welded to the data product. Okay? So there's no data product without a data contract.
What what's kind of really interesting in that is you can build a lot of different tools based on the data contract because because it's so rich. And that's how our user love it because they can they can they can find the data, They can find details of the data they could not find in the past. They they finally understand what SLAs is and how it impacts their job. It's not just a, you know, a bunch of nines. Okay? It's it's more precise than that. So it there's a lot of benefits to that. The way we implemented our our data product, our data quanta, is that you've got at least 1 data contract that is for the output, but we've got data contracts for the input as well. So, you know, it's like a it's like on a factory plant. Right? On a factory floor, you're building cars.
You're testing the parts that are coming in from your from your suppliers. Okay? So we're doing the same thing there, and we've got a data contract that is checking what is coming in. This data contract, which is coming in, it's still done by our teams most of the time because we're very young in the mesh. But our hope is that this data contract is going to be built by the by the upstream systems as well. Okay. So that's that's our next step as well.
[00:24:54] Unknown:
And the topic of tooling around the data contracts is 1 that we could probably spend a whole other episode on, so I will kind of table that for now. But another interesting aspect of the idea of building a data mesh and having these bounded contexts in which these different data products are produced and the different interfaces that they make available because of the fact that there are different people who are owning the end to end workflow of data within that boundary, is wondering what you view as the trade offs of kind of strict technical uniformity in terms of the tools and systems that are available to being building and producing those datasets or those data products versus allowing each owner of that particular domain to kind of own their own destiny and say, well, we need to be able to use this, this, and this tool for this, this, and this reason for these types of data. And over here, we're using, you know, tools b, c, d, and e instead and just kind of figuring out what what are the trade offs of kind of economies of scale because everybody's using the same tools versus the constraints that are imposed if everybody has to work on the same systems and not have the ability to kind of bring in their own capabilities to meet a specific need.
[00:26:07] Unknown:
So the way we've done things is if you're looking at the implementation we've we've done from from a platform perspective, we really follow the sidecar approach. So all the data mesh tooling specifics is a sidecar, which is talking to the implement to the to the low level implementation. So when we are building a data a data product and there's an internal transformation, there's internal modeling, there's all these best practices that the data engineers know and master. Well, I'm not I'm not going to tell them, hey. You can't do that anymore. Okay? If 1 of the kind of I would say, okay, it's a poor dad's joke, but the thing is, what the internal joke is that if you want to do your data transformation in Pearl, I'm not the 1 that's going to to stop you. Okay? You may have a problem with your manager, but I'm not the 1 going to to stop you. So, really, the thing is, the only thing we're asking people is to follow, really, the data contract on the APIs we published.
So that's that's kind of guaranteeing the the interoperability of every data product, every data quantum, we within the mesh. Otherwise, you cannot build your mesh. Right? As a a if someone is using a different data contract, for a different data product, then it's going to be it's going to be a little bit messy at the end. And and so that so we are not enforcing low level stuff. We are enforcing our the interface, basically. Mhmm.
[00:27:35] Unknown:
In terms of the challenges that you encountered in the process of going through this journey of, okay, I need to be able to build this data platform. Oh, hey. It's actually a data mesh. Now I need to align everybody along these paradigms of figuring out what are the proper domain boundaries, what are the the interfaces, how do we define and enforce contracts? I'm curious what are some of the technical and procedural and social challenges that you've had to overcome along that journey?
[00:28:01] Unknown:
So so that's where I need my lawyer. With all the kids we had to kidnap and all the blackmail we had to do no. I'm just I'm just I'm just kidding. It's it's it was not easy. Okay. I I I wouldn't lie. We we are a big organization. We we've we've roadmaps. We're rather small business unit. So we had to we had to find compromise. We had to sell our our additional value. We had to, we had to adapt to, to the ways of working at PayPal. And I was new there. I still consider myself new. And my, my, my, my boss, my great boss, rather new as well. So we made some mistakes and we probably hurt a little feelings here and there.
But finally, it came to place as we've been able to demonstrate the value of what we were building. So the journey over last year was was really interesting. Our first prototype of a data of a data mesh, we had something running in May. So you see, hired in December. Vision established in January realizing that, oh, we are building a data mesh in in March. We were already building stuff at that time. And in May, we had a a POC, let's say. From from May to just before Christmas in December, where we where we went to production, of course, we added a lot more features, but there was this real effort in partnering with people within PayPal to make sure that we matched security, we matched the ways of working, and that we we match the existing the existing systems. Okay? So that was it was interesting. It was I I would say it it does been a little bit hard at some point. And my background has been, you know, a lot on startups and where, hey, let's do it this way, where we don't ask for permission. Okay? And we we just we just roll it down.
Where at PayPal, we had to be a little bit more conscious about about all the all the environment we were we were in. But having said that, I think that if we had started by trying to follow all the rules and not develop our PUC very early on, I I don't think we would have been able to actually demonstrate the value. Okay. So so so there's pros and cons in every in everything you're doing. Right? But it's been a it's been a great journey, honestly. So so even if the there was a bit of, you know, teeth cringing and things like that at some point, I think we we we all learned from it.
[00:30:32] Unknown:
Another interesting element of the idea of data mesh and having these different discrete data products is kind of discoverability of what those products are, what they do, you know, the SLAs around them. And so I'm curious how you are managing the going back to your notion of the contracts being a brochure, how are you managing kind of the the brochures and the catalog for this is the data that we have, this is how you can consume it, this is you know, these are the guarantees around it. And then, you know, going deeper into, you know, within the boundaries of those data domains, how are you managing things like observability, data quality monitoring, making sure that you're actually staying true to the promises that you're providing in those SLA documents, etcetera?
[00:31:16] Unknown:
So so so we did not introduce SLAs at PayPal or data quality. Okay? So PayPal had had that that in place for a long time, and, it's like self-service, okay, which which is also 1 of the big part in me in, data mesh. All that was already existing at PayPal. And very early on in the project, we decided to partner with, the team that was in charge of all the data quality rules and, and execution and to make sure that it was kind of standard. Okay. So so they had a the difference the only difference we had between us and and and them was they had a centralized vision of it, and we had a more like a decentralized business unit version vision of how to use the tools. So we just had to overlap the both both vision. We made it, and and voila. Okay? It was it was it was there. It was it was it was working. And we're continuing this partnership by developing more APIs on their end that we're consuming on our end and pushing stuff on their end, etcetera. Okay? And that that's just an example for data quality. But it's it's it's the same thing for every other element that was already existing at Paypal. It was as much as a development effort, it was also an integration effort with within within all the great tooling we have. So it's it allowed us to to gain and to be able to to, to ship way in time and way on the budget.
[00:32:46] Unknown:
As far as the experience of building the data mesh, working with your teams, working with the business, now that you have things underway, you're continuing to produce new data products, and, you've kind of established the baseline patterns and principles. What are some of the most interesting or innovative or unexpected ways that you have seen the data mesh used and ways that you have seen the introduction of this concept start to take root in other, teams outside of your purview?
[00:33:14] Unknown:
So so so kind of what the unexpected was as easy as it was to explain what a data product life cycle is, and to to be able to demonstrate it. And I was because it's kind of a newish concept in in data engineering, I was I was not I was not expecting adoption. And that was really kind of surprised me that people would say, wow, it's that that's it. Okay. I didn't have to make 20 slides to explain what the data product lifecycle is. 1 was just enough and people would just get it. And I think that that was 1 of the big surprises. Now we're a little bit on the flip coin of the success is that everybody wants more data into data mesh. Okay? So everybody wants more. I I'm a product I'm a producer of this data. I want it there. I'm a consumer of this data.
I want to be able to to access it as well there. So this push was something that we I was not expecting at this level of intensity. Okay? So it took us, like, a year to make 6 data products, and in 2 months, we're making off another 40. So and that's also a reason why we had to split the teams. It's just because, okay, we need to we need to be prepared with with growth. Okay? So so which is great. Love it. Okay. So
[00:34:35] Unknown:
In your experience of going through this journey, building out this data mesh practice and the the foundational technologies around it, what are some of the most valuable or interesting lessons that you learned in the process?
[00:34:47] Unknown:
At some point, I was kind of, you know, beating myself up in the summer, saying, oh, we should have started more with more integration with directly with with PayPal tooling. I'm gonna, like, starting with early on much earlier on in our partnership, and we were getting some some some some of the walls and say, oh, maybe we should have done it differently. But I think that in retrospect, going back and saying, hey, we built it first and then we integrated it, was was a good lesson. And I think if I had to do it again, I would probably do it the same way. So that was that was something I was not expecting. I beat myself over it. But in the end, it was it was pretty positive.
[00:35:31] Unknown:
And for people who are curious about the kind of potential for data mesh in their organization, what are the cases where you would say that that is the wrong choice and you'd be better off with a different architectural pattern? I think right
[00:35:45] Unknown:
now, we're really looking at data mesh in terms of analytics. If your use case is not analytics, I would probably not suggest it right now. Also, we started to to to talk about it in terms of operational and not only read only operational, but we are this would be for for us, it might come at some point, but it's it's kind of a a little bit far far stretch. Okay? So data mesh might not be the solution for everything. The data contract might be the solution for everything, and that probably is something we need to think a little bit more in details.
[00:36:20] Unknown:
And as you continue to iterate on the different data products that are being produced, the foundational principles of how to build and maintain the data mesh itself. What do you have planned for the future or any particular projects or problem domains you're excited to dig into?
[00:36:36] Unknown:
I've got I've got a next mine with all the features on my left screen here. It's huge, okay, in terms of of of features we we want we want to add. 1 of that is there's 2 that are really exciting me. 1 is the BI integration. We had a persona based development. Okay. So, and, the first part was really to address our direct consumers, data scientists, data analysts. Okay? So we we define a persona specifically around that to identify what their needs were and to fulfill them. The second persona was a data engineer. The data engineer for us is a person that is going to be building the the data product. And then we we're we're seeing, of course, it it would be oversimplifying the data world that we've got 2 personas and that's it. Right? So we've got more persona coming and more being identified.
1 which is really exciting me is, the old people doing BI and making sure that they can interface directly from their BI tool to the data products. Okay? So and not having, oh, I'm going to copy my data from the data product into my Tableau, fill my q xd, and blah blah blah. Okay? So so that's, that's that's that's that's not the idea. But trying to have a this this direct link, so so I keep the freshness, I keep the high level of quality of the data I want. So that's that's 1, we're we're starting to work on. And another 1 I I love a lot is, the mesh operation center. Some some kind of making sure that we interface with the right tools. And I don't think we're going to build, you know, the the the data the NOC version of a data mesh for data mesh, but more like having to to make sure that we interface with the right tools and the right observability from a platform perspective to make sure that that it that it runs, you know, 247.
What what's really interesting is the growth. Okay? So when we designed it, we designed it with growth in mind. Okay? So I always tell told my teams that, hey. Things that we're going to have thousands of data quanta to manage. Okay? It's not going to be 3. Okay? We're not going to bolt that for 2 domains. Okay? We're going to have thousands to manage even, if it stays within our BU. So because of that, we we are thinking that we're having that approach of, okay, we need these mesh operations centers. We need these tooling that is capable of managing all this quanta. And and Jamak in her book, when she described the different planes of the of the data mesh, She described the planes where it's it's really the mesh experience plane where it's really about managing all these data products. Okay? So and not only managing. So we've got our capability map. We we know where we're going. We've got our road map. It's it's a it's a fantastic journey. And as we've proven that it brings value to the company, having the company is also backing us more. So it's all great, honestly.
It's it's it's a fantastic journey.
[00:39:47] Unknown:
And for people who are interested in going, you know, going on their own journey of building out a data mesh, what are some of the potential pitfalls that you'll call out and any, particular resources that you found most helpful? Obviously, Zhamak's book, but I don't know if there's anything else that you wanna call out.
[00:40:05] Unknown:
Well, now you've got my own book, okay, which is data mesh for all ages. It's 16 pages, and you you know everything in 16 pages with big fonts and and a lot of illustrations. But more seriously, I think, and I don't want to advertise for a a competition a competitor podcast, but what Scott Yehle Roman has been doing with the with the data mesh radio has been pretty awesome for for learning that. What he's been building around the community and is is passing the community to to a more structured organization now. This is this is this this has been a fantastic help as well. Meeting all the practitioners, exchanging.
Yes. Cutter has been great for for the community. There's also books. There's a book for Manning, data mission action, which is pretty good as well. Okay. So it's a bit more pragmatic. I would say it's a jamak's book. So it depends where where you want to learn. For instance, there's a lot of videos as well. There's a lot of podcasts. There's a lot of different resources you can find. And but the community has been been awesome as well. So it's it's only it's interesting to see how it goes. Right? But you see a lot of when you look at development of open source, for example, where people were attached to a product, yours has no product. Okay? Just a bunch of ideas put together by Shamak. Okay. Well, she's doing something else now, but but still, it's a bunch of ideas she put together in a book. And she managed to create an important, almost fun base, of of of data.
So it's going to be interesting to see how how that evolves. I try I try to be very careful in the resources I find to be to stay away from vendors. You've got and I'm not trying we we've got a for example, we've got a great partnership with with with Google and GCP, so that's that's that's pretty awesome. But you've got to be careful. A lot of the vendors are trying to jump on the bandwagon of the data mesh and say, hey, I've got a data mesh product. I'm not saying they're wrong. What I'm saying is they're first selling their product before they're selling the the the paraligum of data mesh. So educate yourself, and then find the right product that will help you build your own data mesh. But there are some products out there that would be great for implementing data mesh. I'm I'm thinking of great expectation, for example, when it comes to all data quality issues. We have our own framework at PayPal, but, if you if you don't have 1, just check it check this 1 out, okay, and integrate with it.
You may know that I love Spark. I still love Spark for quite a bit. Spark is a great data transformation engine. Okay? So you can rely on on on using Spark for for that thing. So so it's really and and that's that's also a bit the idea is there is that you don't have to adhere to a single vendor. Okay? If if you if you want to use, Watson knowledge catalog, for example, to interface for your cataloging features, just use it. Okay? Build the APIs that are going to communicate with that. Build your pipeline, your inter your internal pipeline with the tools you know. You don't have to completely brainwash yourself to go to data mesh.
[00:43:14] Unknown:
Are there any other aspects of your experience of building a data mesh implementation at PayPal or the lessons learned or the technical or social aspects of making it work that we didn't discuss yet that you would like to cover before we close out the show?
[00:43:27] Unknown:
Yeah. I I think you've got to think about the about your developers. Okay? Your developers are coming there. They don't have to say it's going to be pretty difficult to hire someone with 5 years of data mesh experience right now. Okay? So I think I think Scott and many others are called as the the cognitive overload. We invested a lot in training our developers, in training our users in what the data mesh is. Okay? There was a lot there was a lot of talk. Sometimes the 1 repeated several times or tweet or or do things like that. We really invested a lot in that to make sure that people were starting to understand that. And you would be surprised that it takes, you wouldn't be surprised, but it takes some time.
I mean, it takes some time for for for people that are thinking about a certain way to be able to think to to change their thinking to a new way. It's all doable. It takes a little time. Don't don't underestimate it. And and, oh, to be honest, all the engineers in the 5 teams I have, and, even the 3 I have now, they're all passionate about it. Okay. Because it's something new. It's something exciting. It's something they will be able to put on their resume. And, and, and that's, that's just fantastic. And to be honest, in this whole year, I didn't lose anyone.
[00:44:50] Unknown:
Alright. Well, for anybody who wants to get in touch with with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:45:06] Unknown:
I would still love to have a real kind of data governance solution that is open source. I think I think we're we're we're missing we're missing that. And something which is kind of approachable. I cannot say names because I I don't want to to to make less, more enemies than I already have in this field. But something easy to use, like like the iPhone of data governance in an open source way with a paying model, of course, but something like I'd love to see that something like that.
[00:45:37] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share the work that you and your team have been doing at PayPal to implement these data mesh principles. It's always great to speak to people who are taking some of these nebulous ideas that are circulated around the community and put them into concrete action. So I appreciate you taking the time to join me and share your experiences. So thank you again, and I hope you enjoy the rest of your day. Well, thank you very much, Tobias, and thank you for running this show. It's very educational for a lot of people, and I learn a lot from it as well. So thank you.
[00:46:13] Unknown:
Thank you for listening. Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com. Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at dataengineeringpodcast.com with your story.
And to help other people find the show, please leave a review on Apple Podcasts, and tell your friends and coworkers.
Introduction and Sponsor Message
Guest Introduction: Jean Georges Perrin
Implementing Data Mesh at PayPal
Defining Data Mesh and Its Scope
Organizational Challenges and Strategies
Data Contracts: Purpose and Implementation
Tooling and Technical Uniformity
Challenges and Lessons Learned
Innovative Uses and Adoption of Data Mesh
Future Plans and Exciting Projects
Resources and Advice for Data Mesh Implementation
Final Thoughts and Closing Remarks