Summary
The most complicated part of data engineering is the effort involved in making the raw data fit into the narrative of the business. Master Data Management (MDM) is the process of building consensus around what the information actually means in the context of the business and then shaping the data to match those semantics. In this episode Malcolm Hawker shares his years of experience working in this domain to explore the combination of technical and social skills that are necessary to make an MDM project successful both at the outset and over the long term.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- Random data doesn’t do it — and production data is not safe (or legal) for developers to use. What if you could mimic your entire production database to create a realistic dataset with zero sensitive data? Tonic.ai does exactly that. With Tonic, you can generate fake data that looks, acts, and behaves like production because it’s made from production. Using universal data connectors and a flexible API, Tonic integrates seamlessly into your existing pipelines and allows you to shape and size your data to the scale, realism, and degree of privacy that you need. The platform offers advanced subsetting, secure de-identification, and ML-driven data synthesis to create targeted test data for all of your pre-production environments. Your newly mimicked datasets are safe to share with developers, QA, data scientists—heck, even distributed teams around the world. Shorten development cycles, eliminate the need for cumbersome data pipeline work, and mathematically guarantee the privacy of your data, with Tonic.ai. Data Engineering Podcast listeners can sign up for a free 2-week sandbox account, go to dataengineeringpodcast.com/tonic today to give it a try!
- RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder.
- Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer.
- Your host is Tobias Macey and today I’m interviewing Malcolm Hawker about master data management strategies for the enterprise
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by giving your definition of what MDM is and the scope of activities/functions that it includes?
- How have evolutions in the data landscape shifted the conversation around MDM?
- Can you describe what Profisee is and the story behind it?
- What was your path to joining Profisee and what is your role in the business?
- Who are the target customers for Profisee?
- What are the challenges that they typically experience that leads them to MDM as a solution for their problems?
- How does the narrative around data observability/data quality from tools such as Great Expectations, Monte Carlo, etc. differ from the data quality benefits of a MDM strategy?
- How do recent conversations around semantic/metrics layers compare to the way that MDM approaches the problem of domain modeling?
- What are the steps to defining an MDM strategy for an organization or business unit?
- Once there is a strategy, what are the tactical elements of the implementation?
- What is the role of the toolchain in that implementation? (e.g. Spark, dbt, Airflow, etc.)
- Can you describe how Profisee is implemented?
- How does the customer base inform the architectural approach that Profisee has taken?
- Can you describe the adoption process for an organization that is using Profisee for their MDM?
- Once an organization has defined and adopted an MDM strategy, what are the ongoing maintenance tasks related to the domain models?
- What are the most interesting, innovative, or unexpected ways that you have seen MDM used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working in MDM?
- When is Profisee the wrong choice?
- What do you have planned for the future of Profisee?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
- Profisee
- MDM == Master Data Management
- CRM == Customer Relationship Management
- ERP == Enterprise Resource Planning
- Levenshtein Distance Algorithm
- Soundex
- CDP == Customer Data Platform
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management.
[00:00:18] Unknown:
When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With their new managed database service, you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs. Go to data engineering podcast.com/ linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend. Io, 95% reported being at or overcapacity.
With 72% of data experts reporting demands on their team going up faster than they can hire, it's no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. That's where our friends at Ascend dot io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you're a data engineering podcast listener, you get credits worth $5, 000 when you become a customer. Your host is Tobias Macy. And today, I'm interviewing Malcolm Hocker about master data management strategies for the enterprise. So, Malcolm, can you start by introducing yourself?
[00:02:01] Unknown:
Yes. Thank you. I'm Malcolm Halker. I'm the head of data strategy for Prophecy. Prophecy is a Magic Quadrant vendor of MDM, master data management software. My primary job is to serve as an evangelist within the data management community and to bring awareness for the benefits of MDM, why you should care, how MDM fits into an overall data ecosystem, why you need to be thinking about governance as well. I'm sure we'll talk about that in a lot more detail. But at a high level, that is my role with prophecy.
[00:02:34] Unknown:
And do you remember how you first got involved working in the area of data?
[00:02:38] Unknown:
Oh, gosh. That's a great question. Yes. And it was kind of the seminal moment as well. I mean, I had been kinda born and raised in the software development world, and there was always data there. Of course, there's databases sitting under all applications. And I had managed a number of engineering teams that were deploying and managing database software. So I suspect I've always been in data management going across my entire 30 year career. It sounds even crazy to say that out loud. But, yeah, almost 30 years. But really, I kind of really dove into the data management space, the data and analytics space. When I got hired as a consultant for a $1, 000, 000, 000 publicly traded company, my SOW, my statement of work as that consultant was really simple. The statement of work said, solve the problem of answering how many customers we have.
That was pretty much it. That was the entire SOW. And I was like, man, this is gonna be the easiest gig ever. Right? I'll go in, and I'll help them establish some analytics. We'll we'll stand up some dashboards. I'll connect a few databases. I'll run a few integration scripts, and and I'll be off to the races. And the client will be just be ecstatic. And I figured out rather quickly that it wasn't gonna be that easy, that there was a lot more involved to simply answering the question of how many customers do we have because, you know, what I found and what everybody inevitably finds is that, you know, in this case, it was b to b, but it could be b to c. It doesn't really matter. Through a b to b lens, it was Acme, Acme Inc, Acme LLC, Acme Co, and we didn't really know if that was 1 thing or 4 things.
And that was really my appetizer, I should say, into the world of data management. And it's what, believe it or not, really kind of drew me into the space because I was just fascinated by the ultimate simplicity of the problem and the underlying complexity of the problem. That kind of that paradoxical nation notion of, in this case, MDM, master data management, because what we're talking about is a customer master record. The underlying paradox there just drew me in, but I just found it fascinating. And from that point forward, I've really kind of been focused on data management, specifically governance and master data management. Yeah. And and even beyond the kind of entity resolution problem, there's also the initial question of, well, what is a customer? How do you define whether somebody is a customer or not? I, Tobias, I can't tell you how many days of my life I've lost sitting in meeting rooms bickering about that very topic, about how do you define a customer.
You think it's so simple, but it's just not because everybody's got a different view. Right? And marketing will think 1 way. Finance will think another way. Legal may think an entirely other way, and they're all right. Every 1 of them is right through their operational lens. Right? Marketing tends to look at things through the lens more of kind of prospects and potential customers, maybe even past customers, if they're trying to bring old customers back into the fold. You know, fulfillment or operations or logistics tends to look at ship to addresses. Like, the customer is where the goods go. Right? Legal tends to look at things through the lens of, well, okay. If things break, who's gonna try to sue us? Right? And finance will tend to look at things through the lens of rev rec. And, again, all valid views. But your comment is spot on, which is, you know, how do you define a customer? And the answer to that question is inherently a governance decision.
It's a policy. What you're defining is a policy, but it is intrinsic to a data modeling exercise as well. Right? So if you wanna set up a database and start modeling out customer, another thing that you're gonna do is you're gonna model out relationships that exist as well. Right? So that's a join. Right? I don't need to tell anybody on this podcast how to model data, but it's, again, simple but complex. Right? And it's those things together. How do I define a customer, and what does that actually mean? Well, that's a governance decision. Governance hard. And, also, is a data modeling and a data architecture
[00:06:39] Unknown:
question as well. So, yeah, it's simple, but it's often not. Yeah. It's funny because it's 1 of the perennial problems of data. And 1 of the things that does make it such an interesting and complex space to work in is, you know, if you're working in it as a software engineer, you know, building a web application from the kind of mechanical perspective, it's very rote. You know, you can say, this is the right answer because this is the way that the HTTP spec was written. I know that I can do this. I can write some tests for it, and I am done. You work in data, and it's like, oh, okay. I just want to, you know, build a customer model. Okay. A customer has a name and an address, and you start to talk to the business. They okay. Here's my customer model, and they say, no. That's not right because they're not a customer yet, or we also need all this information to determine when they became a customer or how they became a customer or if they are, you know, a customer of our customer if you're doing b to b to c kind of a thing. Right.
[00:07:30] Unknown:
Right. It's complex, and you just touched on it and I touched on it before. It's not just the individual idea. It's the relationship between those entities. Yep. And which some of those relationships may be nested. Right? In the case of a b to b customer, you know, if your customer is Berkshire Hathaway, does that mean you're also doing business with Dairy Queen? Right? Which is a wholly owned subsidiary of Berkshire Hathaway. Right? Or conversely, if you're doing business with Dairy Queen, does that mean, in theory, that you're also doing business with Berkshire Hathaway? So, you know, building out hierarchies and managing hierarchies of customer information is a key part of MDM. That's a part of it. But, again, it traces right back into a modeling discussion.
Right? Previous to what I do with prophecy, I was an analyst with Gartner for nearly 3 years. But I got a lot of questions when I was a Gartner analyst about, okay, what's the right way to model customer data?
[00:08:20] Unknown:
And the right answer is that you need to understand your business strategy, and you need to understand your business goals, and you need to understand, you know, what you're trying to get out of the data. But it was inevitably a fairly complex question because of the issues of nested relationships and relationships complex relationships that that may exist within a given company. You've touched on it a little bit about sort of what master data management is and its role in an organization, but I'm wondering if you can just give your kind of elevator pitch definition of what master data management is when you're talking to somebody who has never come across it before and some of the overall scope of the activities and functions that it embraces.
[00:08:56] Unknown:
Yep. At a very high level, master data management is the people, process, and technologies that are required to provide the accuracy, consistency, semantic consistency, particularly, needed to optimize the value out of an organization's shared data assets. So the shared data asset part is really important because master data is the data that is used widely across the organization. When I was a Gartner analyst, I would usually visualize this with some form of a Venn diagram, like a 3 ring Venn diagram. And you can imagine right in the middle of that 3 ring Venn diagram, that's master data because it's used everywhere. Right? So not everything is master data. Just because something may be important, important to a specific workflow, important to a specific report, doesn't mean it's master data. Master data is that data that is used consistently everywhere in the organization. So things like we we typically define them through the lens of what we would call a domain or maybe an object. Right? Customer, product, location, contract, SKU.
The list here can be fairly long. But I've worked with some ridiculously large companies where, for example, their customer master record is fewer than 10 fields. Right? So we're not talking every field of customer data. We're talking the fields that are used ubiquitously everywhere and that need consistent definition, that need consistent structure, that need consistent quality, that need consistent governance policies applied to them. So that if the CEO asks how many customers do we have, there can really only be 1 answer, at least at that level. Right? We can get into more complex forms of MDM, more context driven forms of MDM where there are potentially multiple answers, where to a marketer, there could be 1 answer and the CEO, there could be another answer.
And those forms of MDM and those forms of governance exist, but they tend to be fairly rare because the kind of the if then notion having multiple answers requires a fairly mature approach to data management that frankly most companies lack. Right? For most companies, if they can just get to the point of there being 1 answer that everybody trusts and everybody agrees that, well, hey, fireworks. Fantastic.
[00:11:13] Unknown:
Yeah. Absolutely. And 1 of the interesting things about the overall concept of master data management in general is that from my understanding and from the conversations that I've had, it tends to seem as though that idea comes primarily from enterprise organizations where you have these very complex reporting requirements and complex organizational structures. And as the overall ecosystem of data management and big data and all of the associated technologies and trends have come about, You know, the conversation around MDM has faded a bit, and there are things like the metric store or the semantic layer that are, in some ways, kind of replacing those conversations. And I'm wondering what you see as the ways that the conversation around MDM has shifted along with the sort of shifts in technological and organizational capabilities?
[00:12:03] Unknown:
So to the first part about what you were essentially paraphrasing you now is you were asking about, you know, when is MDM relevant? Right? Is there a certain size or complexity of an organization where it becomes more relevant? Right? And is it more relevant for extremely large enterprises? Answer, yes. Absolutely. Right? The bigger and more complex you are and the more decentralized you are from an operating perspective. I gave the example earlier of Berkshire Hathaway. Maybe actually not a very good example because they are very, very decentralized. They're a holding company in essence, and each of the individual operations has complete autonomy to do whatever they wanna do. But there are other organizations where that's not the case. Right? Where organizations are struggling to have a single view of the customer or a single definition of the customer.
And through kind of natural growth or through mergers and acquisitions or through having a lack of governance or a lack of centralized data management approach or a lack of maybe even office of the CDO. Over time, companies have naturally just kind of evolved to different definitions. But as a general statement, I would say the larger the company, the more they tend to have a need for MDM. Is there any sort of cutoff? No. But, generally, what I've seen in my experience is once companies hit 2, 3, 400, 000, 000 in revenue, that seems to be about where they start asking questions about MDM. And almost always the first use case that pops up, almost always, is differences in CRM data and ERP data, marketing versus finance. And that's almost always where splits, tend to happen for the first point for the first time. And, you know, CEOs ask for a report, and then that report takes a week to run. Because somebody in IT has to manually try to reconcile Acme versus Acme Inc. Right? And what that leads to, almost inevitably, on the IT side is kind of like you'd use the word entity resolution before.
Completely correct, AKA matching. Right? And kind of rudimentary forms of MDM almost always take shape in this, you know, IT organizations where somebody generally has some sort of a data loading script or a data transformation script or moving data from a to b, where b is a single bucket of data, where somebody's given the task of, okay, you know, figure out if Acme Inc and Acme LLC are the same thing or a different thing. Right? And which leads to things like, okay. Well, if they share the first 3 letters of the string or the first 4 letters of the string, chances are pretty good. It's the same thing. Oh, well, wait a minute. Look. I need to look across fields. It's not just the name field. It's also the address field. And, oh, well, wait a minute. Hold on a second. You know, there's natural differences here that appear, from a kind of a bill to and a ship to, and there's lots of different versions of Acme that are all valid. And then all of a sudden, MDM phone you know, the vendors of MDM software, their phone starts to ring.
When IT organizations are like, oh, wait. This is way harder than I thought it was gonna be. The stuff that we did with load scripts or, you know, ETL scripts to try to resolve for this just don't work or aren't scalable. Or we did it 2 years ago, and the guy who wrote the script left, and now we can't unravel it. We don't know how the thing works, so maybe we should be looking at software. So there's no hard and fast rule of when, you know, MDM becomes needed, but it's often borne out of IT frustration of around entity resolution. Now the second part of your previous question, you'd asked kind of about evolving technologies and are technologies helping to solve for this? And yes and no is the answer. Right? Like, can you use new technologies like, you know, kind of a data virtualization technologies, entity resolution. There's a lot of things that make MDM different, at least from a software perspective.
That entity resolution. There's a lot of things that make MDM different, at least from a software perspective. And if you were talking to a Gartner analyst, they they would show you something called critical capabilities of MDM, and there's 13 of them that make MDM software unique. There there's an ETL capability. There's a workflow capability. There is data governance. There's data quality, right, where you build business rules into the software to say, well, this is when something is accurate. This is when it's not accurate. There's a UI component obviously to it as well to allow for data stewards to manually review data and on and on. But, you know, the answer to your question is, can new technologies help? Yes. They can certainly help. MDM software is 1 of those technologies that makes these processes a little more scalable, a little more configurable, for sure, where you can kind of turn some of the dials that are used for matching or that are used for some of the business rules around data management. But when you start talking some of the whiz bang y stuff like AI and ML, data virtualization, other new technologies, they really have a hard time solving for some of those core problems that I was talking about, particularly entity resolution.
[00:16:44] Unknown:
Yeah. It's it's definitely interesting how every technology tries to obviate the ones that came before when they all need to work in conjunction instead of just saying, we're the new shiny thing. This is all you'll ever need.
[00:16:56] Unknown:
Right. Well, a great example right now are, you know, data warehouses in the cloud. And I won't name any vendors, but there are certain kind of mini cloud based data warehouse technologies that are saying we can enable a single version of the truth. And they absolutely can enable a single version of the truth, but they're gonna look at source data and they're going to see ACME Inc, ACME LLC, Dairy Queen, and say, okay, wait a minute. Those are probably 3 different things because they have 3 different source IDs. Right? They may go so far as to run some very basic data quality and understand, okay, maybe some rough similarities there, but these are probably 3 different things. So I'll put them into 3 master IDs in my data warehouse and poof, where there you go. We've got a master record, a sort single source of truth for Acme Incorporated. But then you've got another 1 for Acme LLC sitting right next to it and another 1 for Dairy Queen sitting right next to that. Where Dairy Queen and Acme LLC may actually be the exact same thing. Maybe they changed names. Maybe they are part of the same corporate hierarchy where you've made a decision to say anybody in that hierarchy is part of the same corporate family or shares the same customer ID.
Probably not as relevant use case, but you get my point. Right? Which is a data warehouse can absolutely be used to establish a single source of the truth. Yes, it can. But does it have all the flexibility and configurability to allow for all the things that NBM software can do? Typically, they don't. Right? I had many clients when I was a Gartner analyst ask me, hey. I'm standing up a data warehouse. You know, insert big name here. It doesn't matter. Right? AWS, Azure, it doesn't matter. Any proprietary names. Can I use that as my MDM? And the answer almost always was, well, probably not.
Probably not because they don't have those 13 capabilities that MDM software are purpose built to solve for, to support. So they have 1 or 2 of them. Right? They've got integrations that are kind of hardwired in. They've got the single repository, and that's a good thing, whether that is physical repository or virtual repository, 1 or the other. But do they have stewardship UIs? Do they have complex business rule management for governance policies and on and on? No. They don't. And the same is true with a lot of the source systems. So I used to get asked all the time, hey. We're spending a lot of money on salesforce.com for a CRM. Can I use that as my MDM?
Well, again, probably not. Right? Salesforce, awesome tool. It can be a great marketing source of truth. But as an enterprise wide source of truth, you're probably gonna run into a situation where people in finance or legal or operations or logistics or you name it probably don't view that data the same way that people in sales view that data because Salesforce or any CRM system is purpose built for sales centric use cases, not enterprise wide use cases. Absolutely.
[00:19:39] Unknown:
And in terms of the overall MDM effort in an organization, who are the people who are typically responsible for that once they do say, okay. This is an actual endeavor that we have to support. We need to invest in it. You know, maybe you bring in a vendor solution to be able to help with that. But who are the people who are actually responsible for making it work and maintaining it over the long run?
[00:20:00] Unknown:
The textbook answer is that MDM is supposed to be a collaboration between IT and the business, where there is an active collaboration between systems and operations and software, which is the domain of IT, and business rule management. I will just loosely say business rule management, AKA requirements. You can call them requirements, but it's all the business rules that would be configured into an MDM for things like, how do I define a customer? How do I define those customer relationships? What are the data quality rules that this system will use? Right? When are 2 records the same and when are they not the same and on and on. So in a perfect world, there's an active collaboration. There are people on the business side of the house who are defining all the requirements, and they're helping define some very basic data governance policies that would be configured into an MDM. Then on the IT side of the house, generally, MDM lives within some data and analytics function. Right? The same team that is gonna be deploying Tableau or Qlik or Burst or the same team that has the data science function, the same team that typically would have in some, you know, data integration functions, data management, data modeling functions. That's generally where MDM lives.
Typically, 90% of the time, MDM is being deployed and implemented by consultants. This is a metric that that Gartner published in its Magic Quadrant 2 years ago. I don't see any reason why that would have changed in the last 2 years. So consultants are heavily involved here because there's a shortage, if you ask me, of subject matter experts who really know MDM very, very well, particularly within various vendor solutions, whether that is Prophecy, the company that I work for, or anybody else. Consultants can play a very, very important role here because they will have experts who know that software and who can help get it up and running. An MDM team will typically involve some sort of program lead, the director of of MDM that would manage some form of a small team that would generally include some form of an analyst.
Can be 2 forms of analyst, more of a business analyst, somebody that would know the business processes and dive into things like data lineage, business process, how, you know, how did customer records or product records or whatever records get created. There can be more systems analysts as well. Right? The people that tend to understand the back end and who know, you know, how to build ERDs and may even know SQL and can help from a systems and operations and deployment perspective. Generally, some form of a data architect involved. Generally, some form of a systems architect involved. And then getting back into the business side, inevitably, once you have the MDM software up and running, there is a need for what's called data stewardship. Right? Human beings to manage exceptions. So you'll code rules into an MDM that says, Acme Inc and Acme LLC are the same thing. And you'll say, you know, we'll wait the name 20%. We'll wait the address 30%. So the algorithms that are running in MDM programs are are fairly advanced, and are getting even more advanced thanks to, addition of graph and a few other cool technologies.
But the algorithms can only go so far. Right? Typically, you will have some humans involved for exceptions where the algorithm says, I can't I can't make a firm determination whether Acme Inc and Acme LLC are 1 thing or 2 things. And I keep using a b to b context here, but it could be gsmith@gmail.comandjeff smith@gmail.com. Right? Whether that's 1 thing or 2 things is equally relevant in the in the consumer and corporate, b to c and b to b spaces. Although in the b to c world, you know, the the prevalence of email addresses, I wouldn't say it makes it necessarily easier.
But but, you know, email address plus cell phone, you know, there are some identifiers out there that that are that are, I won't say, you know, persistent, but but are are more often used.
[00:23:31] Unknown:
As far as the actual technology solutions, you mentioned that there are these vendors out there, Prophecy being the 1 that you work for and the 1 that we'll probably spend most of our conversation on today. And I'm wondering if you can just talk to some of the ways that that software integrates with the data systems of an organization to support and maintain that MDM solution and some of the capabilities that they bring in and some of the ways that you think about kind of selling into an organization and going through that integration and implementation phase?
[00:24:02] Unknown:
There's really kinda 2 high level, what we would call, styles of MDM. There are analytical styles of MDM, and then there are operational styles of MDM. A great way to think about MDM is that it will be a data hub. These are hubs of data. These are collection points of data where they will sit on top of, logically speaking, on top of source systems of data. Right? So to deploy an MDM, you'll set up this this hub. Increasingly, these are cloud based. Right? And pick your cloud. Doesn't really matter. We can run it in Azure. We can run it in in AWS on and on, where that hub will be collecting data from multiple source systems. Right? It'll go collect customer data from a CRM. It'll collect customer data from an ERP or often multiple ERPs. Right? For a lot of bigger companies, they'll have more than 1. Right? Where it'll be, I'll go get data on customers from ERP 1, ERP 2, ERP 3, often from, you know, IT service management type applications like a ServiceNow. Anywhere there's customer data, the MDM will have an integration to those systems, right, where it could be a pull of data. It could be a push of data. It could be a Kafka stream of data. It doesn't really matter.
But we're certainly facing KPIs on that MDM hub. We'll be constantly being pulling for data on customers new or edits to data. And where that customer data it kind of be streamed down, a light version of that customer record will persist in the MDM. Right? So, again, you know, I'd mentioned before, it's not gonna be all 200 fields of your customer record. It's gonna be a limited number of fields in your customer record that are replicated into an MDM hub. Now getting back to that notion of analytical versus operational, an analytical MDM really solves the question of a 360 of something. I will loosely use the word 360 of something. Right? If your only use case is to solve for a view of a single view of your customers, then an analytical use case is perfect. And all you're trying to do in that case is to tie all of those versions of Acme together under some unifying ID. Right? Using a set of business rules that are configured in the MDM, where all that data gets pulled into a centralized hub, and then you try to unify all those versions of Acme under some new master ID and potentially even a master record that just kinda serves as a stub, as a placeholder. Think of it more as a kind of a registry where each of those kind of let's just call them child IDs are linked to some parent ID in the MDM hub, which will allow you to then aggregate any sort of data associated to those child records in a consistent, accurate, trustworthy way so that you could run a report that says, Here's how much business we're doing with Acme Incorporated. So that's an analytical style of MDM where the flow of information is 1 way from the source systems into the hub and it dead ends in the hub. And where you could be using that, you know, those IDs and those keys for reporting in your enterprise wide analytics platform, but where there isn't a bidirectional flow between the MDM and the source and the contributing systems.
That bidirectional flow would be more akin to an operational pattern of MDM, where that first half that I described, MDM will pull in the customer records, the product records, the location records, whatever objects that you you're focusing on. It will create a new master record for Acme Incorporated or for a product or for a location. And then it can actually turn around and syndicate that data back down into consuming systems. In theory, you could start with 5 versions of Acme Incorporated in CRM. But once MDM has run its processes, you could merge those records down into a single record and propagate that back into a CRM or back into an ERP with 1 consistent record.
So 2 different styles of MDM, kind of typified versus analytics versus operations, but the first 1 being relatively simple. I'm air quoting simple here to deploy because you're not merging records. You're not getting into the business of changing any workflows. You're not getting into the business of changing how core applications work. That's more the domain of operational MDM, where you could be actually, you know, trying to inform or enforce business rules within contributing systems. So you could even go so far as to use MDM with a real time connection to, say, a CRM system where somebody's typing a new record into a CRM, Acme Inc, where real time via an API connection to an a b into an MDM hub, the source application could say, wait a minute. This already exists.
Right? Are you sure you'd wanna create this record for Acme because we know it already exists? Yes or no. That type of thing. So 2 different deployment styles. Generally, these are data hubs where the data is sitting and is being persisted in an MDM hub where it could be used in multiple downstream applications, including, you know, BI platforms or even operational systems themselves.
[00:28:44] Unknown:
As far as the strategy of going into an organization and saying, okay. We've got the technology. We have a way to be able to reconcile these different source records. We can figure out what the sort of combination is supposed to be, but now we need to actually figure out how we want to combine those. What are those canonical business objects, and how do we define them? What are the necessary attributes? And just figuring out the actual strategic elements of implementing MDM, figuring out those governance policies, and then translating that into the actual tactical elements and turning that into an ongoing process that gets maintained.
And then once we explore that, maybe feeding into the conversation about, okay. I've got my master data management. I've figured out how to map it across my entire organization. And then, oh, shoot. Now I just went and bought another company. I've gotta figure it out all over again. Right. Right. The process you first described, you know, what are the business rules that you use to merge records? Right? If you've got 4 versions of Acme Incorporated,
[00:29:43] Unknown:
you know, what rules do you use to merge them together? That's what, you know, MDM and governance nerds like me called survivorship. What are the rules to do that? That's the hard part of MDM. That's really the hard part of MDM because what you're trying to decide is who wins. Right? Does Does marketing win or does finance win? Or is there some sort of compromise that happens between those 2 groups or the 5 groups or the 10 groups or the 10 divisions? It doesn't matter. Everybody wants their version to win. So that's the real hard part of MDM. The technology of MDM so, you know, putting on my analyst hat here again, MDM is both a discipline. It's a way to manage data, but it's a technology. So it's both. It's a noun and a verb.
And when it gets into operational styles of MDM, when you start talking about merging records, when you start talking about having a single version of the truth that is used operationally, then that's when things get really, really hard. And if you don't have some form of governance, if you don't have strong executive support where you have an executive stakeholder who's acting as an arbiter, right, making sure everybody gets along and that the rules are being followed, that's kind of where MDM programs can go sideways. Right? If you talk to a lot of, you know, elder statesmen like me who've been in data management a while, chances are pretty good they will have some some experience in a failed MDM program. I hear it all the time that where that failed MDM deployment has a lot of scar tissue associated to it. Right? The story kinda goes like this, which is, oh, well, you know, needed a single version of the truth.
And, you know, maybe our regulator told us to do that in more of a banking use case, or maybe it was our auditor told us to do it, or maybe our CEO told us to do it because we were trying to work towards some sort of, you know, digital transformation type endgame. And we hired a consultant, and they came and did a due diligence. And they spent 9 months on the as is. And they spent 9 months kinda cataloguing all of our data, talking to all of our stakeholders, and building a business glossary, and, you know, detailed understanding of our lineage and on and on and on. And then by end of year 1, we had a new CIO, and they got frustrated with the lack of progress on MDM, so we shelved it.
I mean, like that I I hear that stuff all the time, and I used to hear it when I was an analyst all the time. And that's kind of, like, typifies the what not to do on an MDM deployment. Right? Because when the consultants descend, they're used 90% of the time. So don't get me wrong. You're probably gonna get some value out of consultants. You're probably gonna need their help. But there is a little bit of a vested interest for the consultants to make this a pretty big program. Right? Because when they make it a big program, they get paid more. Right? And so how do you find a balance? Right? If you've got a a need for MDM, if you have been given a mandate by your management to come up with a single version of the truth or a single viewer, a 3 60 view of customer, just be very, very careful that you avoid scope creep and that you avoid trying to take a kind of a big bang approach, that you avoid a lot of the kind of the key pitfalls that often send MDM program sideways. And I touched on a couple of them. Right? 1 is having a too big of a scope.
Right? Don't try to break all of your data silos all at once. Just break a couple data silos. Like, break 2 or 3. Right? Be very focused. Take like an MVP, minimum viable product type approach to taking a limited scope approach here and and get something off the ground quickly. So don't try to boil the ocean from a scope perspective. That's 1 kinda key to success factor. Another is to have that executive support we were talking about. So, so, so critical. You need somebody in your quarter from an executive perspective. 3rd part is don't cut corners on governance. You need the business engaged. You need them to help define customers and customer relationships.
You need them to help with stewardship. So don't cut corners on that. Number 4, know what your expected business outcomes are. I can't tell you how often I would get on the phone when I was an analyst with IT leaders, and I would ask and they would say, hey, Malcolm. We wanna do MDM. We got executive support, and they sign the checks. And they're excited, and we're ready to go. And I'd ask the question, well, why? Well, because our data's bad. Right? Our data's bad, and we've been given a mandate to fix the data. It's, like, cold hard truth that I learned very, very early on in the data and analytics space.
Nobody cares really about bad data other than IT people. Right? For data people like us, it pains us to see bad data. Right? Like, it it causes, like, an antibody response. It's like, we gotta fix the data. Right? What don't you get? We gotta fix the data. But then on the business side of the house, they're like, hey. We're hitting our sales quotas. Right? We're shipping the goods out. Right? We're selling what we need to sell. We're hitting our numbers, and you're telling us the data is bad. 1 of the hardest lessons I learned, I I walked into a CFO's office to try to get funding for an MDM program, and I put together what I thought was a business case. This this is MDM dependency number 4. I have a business case. I thought I had a business case, walked into the CFO's office, and he looked at it. And the business case was all about fixing the data. Right? Reducing our duplicate rate, eliminating null fields and customer records, eliminating malformed fields, and and it was all data quality centric. And those things are important, but the CFO looked at it and said, you know, we're a publicly traded company. I got Deloitte in here 4 times a year auditing my books. They say our data is good. My chief revenue officer says the data is good. Our logistics people say the data could be better, but we're still delivering most of the goods, and everything's fine. And you're telling me everything's bad.
So who am I supposed to believe? Right? Am I supposed to believe all those other people, including my auditor, or am I supposed to believe you? Right? I went back to the drawing board and I said, okay. Wait a minute. I need to look at this a different way. I need to look at this through the lens of, can we sell more? Right? Can we reduce our costs? Can we improve our customer experience? Right? And when I did that, I built a business case for MDM strictly on cross sell, upsell. That was it. That's all I did. For scope number 1 is I forgot that there was other things that we needed to fix as well. But all I did was is I went to a 3rd party data provider. I bought information about corporate hierarchies, and I loaded in into a limited stripped down data hub.
And I did some entity resolution on that data, and I was able to show people in sales that we were selling to division a of a company, but we weren't selling to division b. And I didn't even use the word governance. I didn't use the word data quality. I didn't even use the word MDM. I just said, hey. We ran this report that showed where you could be selling more. Does this interest you? Of course, it interests me. Uh-huh. Eureka. I've got funding for MDM.
[00:36:14] Unknown:
Absolutely. It's 1 of the hard lessons that we have to repeatedly learn as technologists is that technology for the sake of technology is pointless, and only technologists care about it. Nobody cares about your data until it causes a problem for them. You know, nobody goes to your website because of your beautifully written software. They go because it solves a need for them. Like, they don't care how magnificent your middleware layer is because they're never going to see it, and they never want to see it. If they do see it, then you've done something horribly wrong.
[00:36:43] Unknown:
Right. Yeah. Exactly. I mean, I'm pretty active on LinkedIn, and I, you know, I'm pretty active in the industry. And I just I keep scratching my head because I keep seeing these posts from pundits like me who keep talking about, hey. Focus on business value. You need to focus on business value. And it's like, well yeah. Right? Like, everybody thinks they are through their metrics. Right? And if you're in the analytics group and how you are incented is either the deployment of software or maybe it's the speed at which your dashboards run. Maybe you're in the operation side and you're making sure that nothing stops running. Right?
That's what being customer centric is to you. That's what being business focused is to you because that is your business. Right? Keeping the databases running is your business. Keeping the ETL scripts running is your business, and that's how you're incented. Right? At the end of the year, your boss will sit down with you and say, okay. Did the gel scripts run? Yes. Did the service keep running? Yes. Right? Did you hit your 24 hour SLA on that build? Yes. You're customer centric. I get it. It seems again, getting back to that theme, it seems so simple to say, beat focus on outcomes, focus on business outcomes and business value.
But when you're in IT and business value equals those things that I was just talking about, you think you're hitting the mark. So I don't know. I think maybe as, you know, as data leaders, data management professionals, maybe we need a different lexicon. Maybe we need, you know, a different glossary to describe this stuff. I'm not sure. I don't have any magic answers. But, yeah, I get to check a lot of these constant well, there's a lot of finger waving on LinkedIn, you know, about, hey. Focus on business outcomes, and shame on you for not focused on business outcomes. But if you ask most people, they'll tell you they already are.
[00:38:27] Unknown:
RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state of the art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up for free or just get the free t shirt for being a listener of the data engineering podcast at dataengineeringpodcast.com/rudder. Taking that wonderful conversation and going in completely the opposite direction now, I'm interested in dig digging into some of the kind of technical and data modeling considerations of when you're building an MDM solution. So you say, okay, I've got this platform. I'm going to be able to, you know, do the entity resolution for these attributes. You know, how do I think about modeling these different domain objects and the data records? You know, is it primarily a relational exercise? Is it a graph exercise? You mentioned that graph technologies is something that are being brought into more of these MDM solutions. So as somebody who is working at the data layer and, you know, I'm responsible for making sure that all the source systems are staying up to date and that I'm cleaning things appropriately, and I'm managing my data quality. I'm feeding that into my MDM solution. How do I want to present all of those records to that MDM platform to make sure that I'm modeling things appropriately so that those business users can obtain that value from these, you know, beautifully modeled records that nobody else will ever have care about.
[00:40:01] Unknown:
Yep. Yep. So there's a few questions nested in your question. Let's start with the data modeling, and then there was another question really kind of relational, nonrelational, and where are things going? You know, what are they? The answer to the question of, you know, relational, nonrelational is yes. Increasingly, they are both, but kind of legacy old school MDM is a highly relational exercise because the legacy algorithms that run need structure. Right? So there are a well known set of algorithms for entity resolution without going into too much boring detail that the Jarrow Winkler, the Levenshtein distance algorithm, algorithm. The Soundex is another 1. These algorithms, particularly the distance driven ones, so I can measure the difference between characters. And I'm at the boundary of my intellectual capabilities. But to make a long story short, traditionally, they run on capabilities. But to make a long story short, traditionally, they run on kind of doing kind of value pair analysis. Right? Every pair of records is evaluated against itself, which means that that process, entity resolution, requires a little bit of structure and is very compute intense.
Right? However, you can be very specific about the outputs. Right? You can configure the outputs. You can weight the algorithms. You can say use address 10%, or use name 10%, or ignore this field or apply these other rules. So that's kind of old school MDM classic. Right? And for the most part, that is still how MDM platform, most of them are running these days. And there is still a pervasive belief in the business, at least within the user community, or I should say, particularly for more legal driven use cases, that that configurability is absolutely critical. Right? That you need control, you need auditability, you need even rollback, like, the ability to unmerge or unmatch or match as of a certain date.
Right, where the results of the match are persistent and consistent and predictable, particularly, again, for certain use cases. This is still how most MDM plat processes are running these days, and most MDM platforms utilize that form of matching. Where again, kind of born out of more legal or more well defined use cases. There is a growing use of Graph in the space, where there's a growing focus of trying to merge kind of that relational driven, compute intense world with non relational, graph driven, relationship driven, node driven approaches to matching. Now nobody has really kinda figured out how to bring these 2 worlds effectively together. What's being done now is really kind of more of a waterfall type approach where graph driven matching or graph driven entity resolution, which is still very, very early, by the way, from the perspective of kind of its technological maturity. You're on pretty cutting edge ground here in terms of, like, graph based entity resolution. But where I see things going is that there could potentially be kind of a waterfall type approach to entity resolution, where it's a first pass or maybe a second pass. Right? Where you use Graph for a first pass to support more marketing centric use cases where close enough is good enough.
Right? Where you're not gonna get sued if you get it wrong, you know, or you're not gonna break any rules or or regulations or be noncompliant if you get things wrong, where, you know, again, it's a marketing driven use case where close enough is good enough. And if you send the wrong PD mail or the wrong offer to the wrong person, well, not optimal, but you're not gonna get sued. Right? So I could easily see a world where Graph kinda gets integrated into this space where you have both non relational processes that are running and relational processes that are running in the background. What I just described is kind of a bit of a next gen type approach to MDM. There's not a lot of vendors that are that are really kind of doing that yet for a lot of different reasons, mostly because kind of graph based entity resolution is still pretty new. But that's kind of where the industry is headed. Now from a analytics perspective, there's a lot of graph that's being used in the MDM space. Right? Because it can do analytics, and it can help with relationship mapping and hierarchy management in ways that provide a lot of flexibility, and they're very user friendly. Right? Where you can kinda hit the using graph, you can kinda hit the hierarchy o matic button. Right? It's like, go build my hierarchy for me.
Now if you trust the data a 100%, poof, you're done. Right? You could run those processes, and if you trusted your data, your source data, you're good. But I don't know anybody that does. Right? So even in a world of AI assisted, graph assisted, you know, what the industry pundits would call more augmented forms of data management, even in that world, there is a role for kinda human driven oversight here. But, historically, you're talking generally about relational databases that are running match processes in support of more legal or compliance or regulatory driven use cases where the cost of being wrong is is generally pretty high, but where there are new use cases and new applications that are being built to support other use cases that tend to be a little more marketing driven.
And, interestingly, what I just described is the difference between MDM and a new technology that is out there, been around about 5 years, called the CDP, a customer data platform. So customer data platforms are the kind of the new shiny object in the space. They're the new whiz bang y thing, where if you do a search for, you know, single version of the truth or gold master record, what you'll get back is a lot of customer data platforms that are very, very marketing centric where, again, they're non relational in nature where that customer data is not typically persisted over time, where that hub does not persist over time, where even across campaigns or across loads of data, you could get different answers to the same question, which, again, may be good enough for a marketing use case. But, typically, enterprise wide use cases need more persistence and consistency in in the answers.
[00:45:38] Unknown:
In your experience of working in the space of master data management and helping organizations adopt the technologies and understand its utility and the ramifications and the ongoing maintenance that's required? What are some of the most interesting or innovative or unexpected ways that you've seen MDM used or solutions that you've seen to address that need?
[00:45:58] Unknown:
A lot of MDM use cases tend to lean towards kinda same old, same old. Right? Like the customer master record or a product master record. There's some new things afoot in the area of what I would loosely call data sharing. Now I happen to think that there's a lot on the horizon here, and there's some interesting stuff on the horizon, but where companies are getting together to share data. So think about everything I've described of what MDM is. It's a single hub of your customer data, your product data, your location data, your asset data, your employee data. It doesn't matter. But think of that and start looking across multiple companies.
Right? Company a, company b, company c, company d. Right? You could, in theory, use MDM to sit on top of all of those other MDMs. You could build an MDM of MDM where in theory, you know, 2 plus 2 could equal 5, where you could start to drive some network effects by looking across multiple sources of data create a single version of the truth where those companies are comfortable in sharing that data. Right? I would argue, if you can Google search the name and address and headquarters of Verizon, right, you know, that's not competitive data for you as a company. Right? If it's publicly available out there, I would argue that's highly commoditized and probably not a competitive differentiator to know where the headquarters of Verizon is or of any other company. So that's the type of data that I could see start to be used in more of a shared mode. Right? So there's going to be a interesting economies of scale there because right now, and it could be a Verizon record, could be an AT and T record, it could be an Acme Incorporated record. It doesn't matter.
But across companies, across the globe right now, they're all managing their company their customer data, their product data, their location data, and it's all being managed largely the same way, and it is horribly duplicated. Meaning, company a, company b, company c, and if they're all relatively large companies, they're probably all doing business Berkshire Hathaway. They're probably all doing business with Verizon or AT and T. They're probably all doing business with any of the Fortune 1000 or 2000 companies. Right? And they're all applying stewardship to that data. They're all applying business rules to that data. They're all applying storage and compute to that data. And could we start to manage some of that data as more of a shared asset?
So 1 of the more interesting approaches I've seen here, there is, you know, a company over in Europe that has kind of created a consortium for common data management. Consortiums of data have existed for a long time. A great example is like UPC codes, like barcodes and products. Right? At the core of that is a consortium of companies that have got together that have agreed on a common set of data governance policies. Right? Business rules. How do these UPC codes work? How do the barcodes work? What do they mean? Right? That required a bunch of companies to get together, generally, in the form of some sort of industry group. Like, you know, they will create some sort of, like, an industry group that manages the standards. The same is true with any data standards organization, where I could see more and more and more of that evolving over time. Because there's really cool economies of scale that could exist there where instead of a large company paying for 4 or 5 or 6 or maybe even 10 data stewards I was just at a conference 2 days ago in San Diego where I heard of a company that had a 150 data stewards. Right? A 150 people, all they're doing is managing data quality. That's it. Right? To make sure that customer records were accurate. It's a 150 people involved in customer record management.
If you started to manage some of this data as a shared asset, well, if you drove those that 150 people down to, like, 2 or 3, right, where the maintenance of the data becomes more of kind of a shared pooled thing, that could be relevant. That could be interesting. Some other interesting MDM use cases. I mean, there there's certainly a lot of advances in AI and ML in this space. Some vendors are focusing on increasing uses of AI and ML, particularly when it comes to data governance rules and data management where you could build algorithms to kind of train entity resolution processes over time, where the decisions of data stewards could be used to help train match decisions, that kind of thing. We talked about graph, talked about data sharing. There's a few interesting things afoot, maybe even this notion of a data fabric. I was again, I was at a conference a couple of days ago in San Diego, and there was a lot of buzz about data fabric. Most of the buzz was data fabric versus data mesh. It appears there's evolving tribes there where there's a data fabric tribe and a data mesh tribe, and they're rather animated in their belief that 1 may be better than the other. I don't get into all of that. But from the notion of a data fabric, which is, if you ask me, a data fabric is a world where data starts to inform its own classification, its own use, its own management, Right? Where a data catalog could, with a semantic layer on top of it, with an MDM layer on top of it, with a few other layers, a data quality layer on top of it, could where the metadata out of that catalog could to start fueling and augmenting.
Augmenting matters as a word because it's not gonna replace. It'll just make decisions better. It'll augment legacy decisions around integration patterns, data quality rules. Right? MDM business rules, matching rules. Right? Where metadata could be used to say, and plus transactional data, you could say, well, this transaction failed or it didn't fail. Why? Well, it's because it had these record attributes that was the fit that was the successful transaction. This 1 was a failed transaction. What does the failed transaction look like? Well, all of a sudden, over time, you could evolve to start to see, okay. Well, that's what the data quality rule should be. And oh, and by the way, that actually could be considered master data because it seems to be used in a lot of other places where you didn't know it was used before.
Put all those things together where you have kind of a self informed, I don't wanna say self governing, but at least a more automated form of governance, more automated form of MDM, that's the data fabric. By the way, if you talk to the people, at least at Gartner, who are behind the creation and pushing kind of the data fabric narrative, many of whom might know very, very closely, these are these are friends of mine, they'll tell you that there's only, like, about 5, 6, or 7 companies on the entire planet that truly have a real data fabric. So this is, you know, 5 to 7 years from mainstream. There are vendors out there that are saying data fabrics exist. Most Gartner analysts and ex analysts, including myself, would disagree with that, but MDM will play a key role in data fabrics going forward.
[00:52:16] Unknown:
Random data doesn't do it, and production data is not safe or legal for developers to use. What if you can mimic your entire production database to create a realistic dataset with 0 sensitive data? Tonic.ai does exactly that. With tonic, you can generate fake data that looks, acts, and behaves like production because it's made from production. Using universal data connectors and a flexible API, tonic integrates seamlessly into your existing pipelines and allows you to shape and size your data to the scale, realism, and degree of privacy that you need. The platform offers advanced subsetting, secure de identification, and ML driven data synthesis to create targeted test data for all of your preproduction environments.
Your newly mimicked datasets are safe to share with developers, QA, data scientists, heck even distributed teams around the world. Shortened development cycles eliminate the need for cumbersome data pipeline work and mathematically guarantee the privacy of your data with tonic dotai. Data engineering podcast listeners can sign up for a free 2 week sandbox account. So go to data engineering podcast.com/tonic today. In your own experience of working in the space of MDM and helping your customers now that you're at Prophecy and working with organizations when you were an analyst, I'm just wondering what are some of the most interesting or unexpected or challenging lessons you've learned in the process?
[00:53:41] Unknown:
Oh, boy. Never assume that your definition is the same as somebody else's definition. Right? Never assume that. Like, it's a mistake to assume that the way that you look at customer, the way that you look at product, or that the way that you look at employees is the same as anybody else. That's certainly a lesson I've learned. Never underestimate the power of a strong business partnership. I would take a partnership with a motivated, engaged senior director or VP who has budget over a CEO or a CIO saying we need to do something because we need to do something. Right? I've seen a lot of situations where checks get cut and a CIO or CEO says, we gotta go do this, fix your data quality, or do MDM, and then 9 months later, everything has gone sideways.
But if you've got somebody in an operational role who is responsible for, you know, selling stuff or delivering stuff or making stuff, And if you've got a good partnership with them and they are motivated and they've got a lot of pain and they wanna work with you to solve for this, that's worth its weight in gold. That is absolutely worth its weight in gold. So that's another lesson. A third lesson is scope, scope, and scope. And did I mention scope? When it comes to MDM, do not try to boil the ocean. Keep your scope limited for a first launch. Get something out the door that delivers some value quickly. And break 2 or 3 silos, but don't break 15. You may have 15 ERP systems, and that hurts. I get it.
But break 2 or 3. And, again, go back to find those operational leaders who have acute pains, and you'll tend to know who they are because they're the ones that are complaining the loudest. That's certainly a lesson. It's just to manage for scope. And the last 1 I learned is this is not about domains. It is not about domains. Right? I used to ask I was an analyst. What's your focus? Right? How are you gonna manage your scope? Well, we're just gonna focus on customer data. That's how we're gonna limit our scope. We're gonna focus on the customer domain. And I'd take a deep breath, and I would say, you're not limiting your scope. Customer data is everywhere. Right? That's a false sense of security when it comes to scope management. And, by the way, nobody within the organization has any sort of incentives. Nobody is paid on domain. People are paid on processes. Right? Selling more, delivering it faster, making it faster, reducing our fuel costs. That's how people get paid within the organization. That's how they make their bonuses.
And that's what you need to attach MDM to. You need to attach your MDM to 1 of those things, not a domain. Because, again, just like nobody cares about data quality, nobody cares about domains. Most people don't even know what that means in the organization, by the way. And you could change it to object if you want. They don't know what that is either. So focus on a business process and say, okay. I'm going to enable my chief revenue officer to cross sell. That's it. Boom. Right? And I'm not even gonna do it for every division or every product line or every SKU. I'm gonna do it for a very limited subset of our products. That what I always heard, what I would say that to my clients is that's multi domain. Right? Because then I have master some product data and I have to master customer data. I may even have to master contract and maybe even location. I was like, yep. You will. But But keep it limited to sources. Keep it limited to the systems that you're integrating with.
And, yes, you'll cross domains, but your chief revenue officer or your chief product officer will be able to tie right back to what you've done. They'll be able to point a finger and say, you know what? We were selling x before, and now we're selling y. And we're pretty confident that the only reason why we're doing that is because of this MDM thing. But when you're focused on domains, it's like, I don't know if it moved the needle or not. I don't know. So if you focus on process, you're in a good place. Yeah. I'd say that's a valid lesson regardless of whether you're talking about MDM or just, you know, business analytics and data management writ large. Exactly. Yeah. I mean, what I just said could be you're responsible for a data quality project or you're responsible for turning up analytics. You're deploying a Tableau for the organization.
Right? Everything I just said could be applied to all of those use cases. Absolutely.
[00:57:46] Unknown:
And so for people who are trying to figure out how to gain better visibility or better understanding of their organization? What are the cases where MDM is the wrong choice and it's actually too heavyweight? And maybe they should just go and, you know, buy the latest metrics layer or, you know, use some of those AI technologies that will do an entity resolution for you?
[00:58:07] Unknown:
It's a good question. Sometimes I worry that I'm an MDM hammer. Right? And all I see are MDM nails. Right? I think, really, the question you just asked is not whether companies need MDM. I think the question is whether they need MDM software because it's not cheap. Right? Even though the prices are really coming down and you can get enterprise class solutions like prophecy and others for like sub 6 figures. Right? I'll stop selling. But the prices have been coming down. But there's still for some smaller companies, I think you could ask the question, okay. Wait a minute. Am I gonna spend a $100 on MDM software? Plus, I'll probably spend double that on on consultants to get it up and running. So I'm all in for a quarter million probably, and I've got a problem. Right? I don't have a single view of the customer, or my my customer reports are inconsistent, or, you know, whatever some of those pain points are. You know, even small companies can have those problems.
So I think the question is not whether you need MDM, because let's not forget MDM is a discipline, 1st and foremost. It's a way of managing data. It is a way of making sure you have consistency, quality, accuracy. Right? The structure for cut your core shared data assets so they can be used widely. Chances are pretty good you need MDM as a discipline. But you can do MDM in Excel. Yeah. I said it. You could do it. You know? Are you going to have real time back and forth integrations with source systems? No. But if you wanted to build a customer 360, you know, you could bring data out of a few different source systems, drop it into even an access database, and run some basic business rules against it to try to link everything together. Right? It's gonna be a little brittle. It won't be that scalable.
Maybe in time, you'll migrate to some form of MDM software, software, but you may still be able to drive some significant business value by saying, Here's our customer 360 report that we couldn't do before. When you layer in, like, some third party data services out there, particularly in the b to b world, the b to c data providers tend to be fairly expensive. Well, so do the b to b providers. But if you're in b to b mode and, you know, you've got, let's say, 50, 000 customers and you're not really you don't feel good about the accuracy of the data, and, you know, you've got some bad lead people get a data, you lack a single version of the truth or a single customer identifier. You know, there are providers out there that you can bounce that data against, use their match engines.
Right? Use their, you know, what they call reference data. And pretty quickly and relatively affordably, using Excel as a source, have some form of a customer 360 up and running pretty quickly. So I would argue most companies need MDM as a discipline. Do you need MDM software? Is the MDM software too thick for you? It might be. If you've only got, you know, a few 1000 customers, 1000, 3000, 4000, 5000, 5000, MDM software may be too thick. Yep. But but chances are, you still have some use cases that need that consistent approach to the data management side of it. Are there any other aspects of master data management and the strategic and technical elements or the ways that you're approaching in at Prophecy that we didn't discuss yet that you'd like to cover before we close out the show? Yeah. I mean, thank you. This is the kind of the opportunity to sell a little bit. You know, Prophecy is enterprise class MDM software. We are on the on the Gartner MDM Magic Quadrant. We are a challenger in the quadrant and have been making significant progress over the last few years. We're 1 of the fastest growing last year.
Where we differentiate, where Prophesy differentiates is, time to value and simplicity of deployment, which are 2 of the biggest challenges related to MDM. Going back to the story I told a while ago about, you know, paying a ton for consulting fees over a year and then getting nothing to show for it. So Profisee really focuses on being fast, time to value, and easy to deploy. Right? So we run natively in any cloud, but we're particularly good in the Azure cloud where we actually have an integration to Purview, which is Microsoft's data cataloging solution. We're in the marketplace, so you can be, like, up and running, you know, platform as a service in literally in minutes.
Right? Where you don't have to worry about deploying servers necessarily or procuring hardware where, you know, if you've got your data in Azure cloud or any other cloud for that matter, prophecy is exceptional about getting up and running very, very quickly. That is, like, so different than it was even just 2 or 3 years ago where, you know, getting NBM up and running was a challenge in and of itself. So our software can be up and running very, very quickly. Now the hard part of NBM is those business rules that we talked about, and those will take some time to configure. So it's not like you just turn it on and it magically runs on its own, getting back to our very early conversations about how to define a customer. So there are decisions that need to be made, and there are business rules that need to be configured into the software. It just doesn't run magically on its own. But if you're looking to be up and running in 3 months, right, if you have it more of those analytical use cases that I was talking about earlier, if you want a fast time to value and if you want a lower total cost of ownership, Proficy is extremely price competitive, consistently 1 of the more price competitive solutions out there. The key variable here is the amount of data that you're pumping through the Data Hub. If you've got, you know, 40, 000, 000 customer records, you're obviously gonna be paying more for a solution than you would be if you had a 20, 000, customer records.
But through our native integration to Azure, you know, what you're paying for is just the software. It's really kind of bring your own cloud type approach where, I mean, the hosting fees and the compute fees will be yours to deal with through Microsoft, where, you know, all you're buying from us is the software. So lower total cost of ownership, speed to value, ease of use are areas where Prophecy really, really excels. Right? So if you wanna be up and running in a few weeks and not a few months, Prophesy is a fantastic solution.
[01:03:46] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[01:04:01] Unknown:
It's such a great question. And I think that the gap traces back to this notion of business outcomes. If we are making MDM software or if we're making data quality software, if we are making data integration software, or if we're even making data warehouse software, Right? I think we, as vendors, can do a better job to find ways to trace the impact of our software on the business. Right? Meaning, if in your data warehouse, you have transactional data that shows details of your transactions. You've got metadata on every successful transaction. You've got data around every detail, every nugget of the customer experience. You've got data around what people clicked on, what they didn't click on, and on and on and on. I think we kinda fall into this trap where we say, okay. Those are the end user facing apps. Right? Those are the ones like, your CRM system, your digital marketing platforms, although the apps that are running in marketing, well, those are the ones that are gonna optimize the customer experience, and those are the ones that are going to optimize our product pricing or our messaging or the speed or the cadence of our marketing campaigns or all the stuff that touches customers.
Right? And I think we, within the data management space, particularly software vendors, could do a better job to reach into that world. And the data's there. We've got it. Right? It's sitting in data warehouses. It's sitting in data lakes. It's sitting in MDM hubs or data quality hubs, and find ways to link that data back to what's being done from a data management perspective. Right? To link that data back to a decision that was made from an integration perspective or data quality perspective or an MDM perspective. And to say, okay. Wait a minute. We can trace a hard line back from that decision that was made about that data quality policy that was changed to an increase of sales. Right? If we could do that, that's the holy grail. Right? This is what we've always wanted in the data management space to be able to say, we have a hard impact on business outcomes.
But we kinda wash our hands, and we just say, okay. Well, that's the domain of the business apps. CRM systems of the world, the list here is very, very long. Right? They're responsible for that. Right? But I think we could do more to find ways to link what we do in data management to actual business outcomes. Because if we can do that, then we're able to show the ROI of data management. Then we're gonna get the business attention. Then we're gonna get the investment. Then we're gonna get people. For right now, where we are is, you know, hey. Invest in us, please. Right? You need reporting, so you know, and you need visibility, so you better invest in reporting. Or you need better data, then you should invest in MDM. Right?
Yeah. I think we can go farther. And I don't have all the answers there, but the data's there. That part I know. All very true, and it definitely is reflected in kind of
[01:06:41] Unknown:
the way that the industry is trending where in the, you know, early to mid, you know, 2000 and 2010s, it was all about big data. Just collect everything all the time because maybe it'll be useful. And now being very much more intentional about what is being collected and how it's being applied both because of regulatory risks, but also because of organizations being more cost conscious and more of the sort of small to midsized businesses starting to adopt those capabilities, and they don't have these massive, you know, cash troves to throw at this investment in the hopes that someday it will be valuable.
[01:07:15] Unknown:
Yeah. Well, in the companies that did, you know, for a lot of them, that spent 1, 000, 000 and 1, 000, 000 on standing up Hadoop clusters, many of which are still running, and they're driving some value. Don't get me wrong. But for a long time there, for a lot of companies, big investments were in Hadoop were you know, the glib metaphor that I use here is that they were, you know, creating a whole bunch of questions or a whole bunch of answers that were desperately seeking questions. Absolutely. Right? It's like, hey. We found this anecdotal insight about a, b, c. Hey, business. Did you know? Absolutely. So how do we avoid that? How do we go from, we know you care because we've got the metadata to show that you care?
[01:07:57] Unknown:
Awesome. Well, thank you very much for taking the time today to join me and share your experience and perspective on the overall space of master data management. It's definitely a very fascinating area, and it's always great to be able to dig into it with somebody who has such, deep knowledge and experience in the space. I appreciate all of the time and energy that you've put into helping to support that ecosystem and to share it with us today. So, thank you again, and hope you have a good rest of your day. Thanks, Tobias. Really enjoyed it. Same to you. Talk soon. Thank you for listening. Don't forget to check out our other shows, the Data Engineering podcast, which covers the latest on modern data management, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you learned something or tried out a project from the show, then tell us about it. Email hostspythonpodcast.com with your story.
And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Industry Trends
Guest Introduction: Malcolm Hocker
Malcolm's Journey into Data Management
Defining Master Data Management (MDM)
MDM in Large Enterprises
Technological Evolution in MDM
Analytical vs. Operational MDM
Strategic Implementation of MDM
Business Value and MDM
Innovative Uses of MDM
Lessons Learned in MDM
When MDM is Not the Right Choice
Prophecy's Approach to MDM
Biggest Gaps in Data Management Tools