Summary
For any business that wants to stay in operation, the most important thing they can do is understand their customers. American Express has invested substantial time and effort in their Customer 360 product to achieve that understanding. In this episode Purvi Shah, the VP of Enterprise Big Data Platforms at American Express, explains how they have invested in the cloud to power this visibility and the complex suite of integrations they have built and maintained across legacy and modern systems to make it possible.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- You wake up to a Slack message from your CEO, who’s upset because the company’s revenue dashboard is broken. You’re told to fix it before this morning’s board meeting, which is just minutes away. Enter Metaplane, the industry’s only self-serve data observability tool. In just a few clicks, you identify the issue’s root cause, conduct an impact analysis—and save the day. Data leaders at Imperfect Foods, Drift, and Vendr love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free-forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced features with a 14-day free trial. Mention the podcast to get a free "In Data We Trust World Tour" t-shirt.
- RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder.
- Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer.
- Your host is Tobias Macey and today I’m interviewing Purvi Shah about building the Customer 360 data product for American Express and migrating their enterprise data platform to the cloud
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what the Customer 360 project is and the story behind it?
- What are the types of questions and insights that the C360 project is designed to answer?
- Can you describe the types of information and data sources that you are relying on to feed this project?
- What are the different axes of scale that you have had to address in the design and architecture of the C360 project? (e.g. geographical, volume/variety/velocity of data, scale of end-user access and data manipulation, etc.)
- What are some of the challenges that you have had to address in order to build and maintain the map between organizational and technical requirements/semantics in the platform?
- What were some of the early wins that you targeted, and how did the lessons from those successes drive the product design going forward?
- Can you describe the platform architecture for your data systems that are powering the C360 product?
- How have the design/goals/requirements of the system changed since you first started working on it?
- How have you approached the integration and migration of legacy data systems and assets into this new platform?
- What are some of the ongoing maintenance challenges that the legacy platforms introduce?
- Can you describe how you have approached the question of data quality/observability and the validation/verification of the generated assets?
- What are the aspects of governance and access control that you need to deal with being part of a financial institution?
- Now that the C360 product has been in use for a few years, what are the strategic and tactical aspects of the ongoing evolution and maintenance of the product which you have had to address?
- What are the most interesting, innovative, or unexpected ways that you have seen the C360 product used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on C360 for American Express?
- When is a C360 project the wrong choice?
- What do you have planned for the future of C360 and enterprise data platforms at American Express?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With their new managed database service, you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs. Go to data engineering podcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show.
You wake up to a Slack message from your CEO who's upset because the company's revenue dashboard is broken. You're told to fix it before this morning's board meeting, which is just minutes away. Enter Metaplane, the industry's only self serve data observability tool. In just a few clicks, you identify the issue's root cause, conduct an impact analysis, and save the day. Data leaders at Imperfect Foods, Drift, and Vendor love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free forever plan at data engineering podcast.com slash metaplane, or try out their most advanced features with a 14 day free trial. And if you mentioned the podcast, you get a free in data we trust world tour t shirt. Your host is Tobias Macy, and today I'm interviewing Purvi Shah about building the Customer 360 data product for American Express and migrating their enterprise data platform to the cloud. So, Purvi, can you start by introducing yourself?
[00:01:49] Unknown:
Sure. Thanks for having me here. My name is Purvisha. I've been with American Express for 12 years, And currently, I lead the enterprise data platforms team and responsible for transforming the way American Express manages their big data ecosystem. And prior to that, I managed customer 360. So excited to be here and speaking to you about this topic. And do you remember how you first got started working in data? It was back in 2000. I'm gonna date myself here a little bit. I was in college at that time, Penn State University, and 1 of my first jobs was to actually look through reams and reams of search engine data to figure out commonalities, you know, spellings, mistakes, understanding the same topics, and things like that, and trying to figure out how do we create the future search engine that we know today. So that's where, you know, my passion started on working with data. And ever since that, I've worked on variety of different things, including customer 360 recently.
[00:02:49] Unknown:
And in terms of the Customer 360 project, can you give a bit of an overview of what the kind of scope and objectives are and some of the story behind how American Express decided that they wanted to invest in that capability?
[00:03:03] Unknown:
As you know, at American Express, we focus on our customers, and we take customers as a priority. That is our focus for us. And as part of it, 1 of the things that we realized is American Express serves multitude of different types of customers. It could be individuals like you and I with platinum card product or any of the different card products that we have, or merchants, or small businesses, or large businesses. So as you can imagine, it is a variety of different types of customers, not only in US, but globally. As we have grown as a company, what we have realized is that we have created silos in data and what that resulted in in a poor customer experience. And that's the really the beginning of the seed that we needed to start creating customer 360, which the idea and the vision to say, let's bring all of the customer 360 or customer data into 1 place and be able to understand what relationships does this entity, whether it's an individual or businesses, have with American Express or had with American Express in the past.
[00:04:02] Unknown:
Customer 360 as a kind of term is something that can be very nebulous, and people hope that it's the solution to all of their problems. And so as far as being able to provide a reasonable scope and a clear understanding of what its target is, can you give a bit of an overview of the types of questions and insights that it's aimed at answering and some of the ways that these different stakeholders within American Express typically interact with the platform?
[00:04:32] Unknown:
Sure. I love the question because you're right. Customer 360 sometimes is, you know, referred to or thought of as a holy grail of all the problems. Right? But in our case, we were very specific in terms of the scope that we had for customer 360. So our scope encompassed 3 things. 1, we wanted to understand who our customers are. So making sure that we got all of the demographic data that was proliferated across multiple different databases that we had into 1 place, understanding the relationship, cleansing that data is the second big part of it. And then the third part of it is how do we make it use case agnostic, right, which is how do we ensure that this data is truly used as a product at American Express where you could use it during acquisition journey, you could use it during credit decisioning, fraud decisioning, or servicing across the board. So we wanted to really make sure that it was use case agnostic, and we didn't have any specific logic that would prevent it from being used across multitude of use cases.
So it's truly an enterprise asset that is used across use cases globally.
[00:05:38] Unknown:
Given the fact that American Express as an institution has been around for a long while, and I'm sure that there are parts of your systems that even predate the computer age. And so I'm curious if you can talk to some of the types of information that you're working with and the sources of data and the types of integrations that you're relying on to be able to feed into this holistic view of the customer and being able to make that a maintainable and sustainable asset going forward.
[00:06:05] Unknown:
I have to say it was a massive cleanup exercise when we started down this. Right? And I think all of us underestimated how long it takes to clean up the data that has been accumulated over the number of years across the board. You'd be surprised to know that we had demographic data for a customer across 15 different platforms. Cleansing the data, understanding cleansing the data, understanding if 1 system had, you know, the famous example of Christopher that is truncated to Christophe. In 1 system, how do we know it's actually Christopher, and maybe that individual's name is not Christophe? Right? So how do we take that information? And we did a lot of leveraging of AI ML in doing the matching algorithms across the board, but then we also relied on the data that we had been provided by our individual customers and marrying it with some of the external data as a proxy. So we basically did a triangulation to come up with the best customer record, and that's what is built on customer 3
[00:07:08] Unknown:
60 today. To that question of entity resolution and being able to work across multiple different, possibly conflicting sources of information, and also because of the fact that you have an ongoing relationship with the customer, I'm curious how much of that cleanup you are able to lean on the customer themselves to be able to provide insight to where maybe, you know, as they log into their online portal, you can say, this is some information that we have. Please verify its accuracy and be able to kind of use that as part of the cleanup process. And then also how you are able to kind of resolve some of that cleanup for the case where you don't have an ongoing relationship or maybe it's a past customer who has since closed their account.
[00:07:46] Unknown:
For customer 360, we have lot of information that we relied on across multitude of different ways to clean up and come to that answer. I don't necessarily have the information on x percent was based on customer providing the information versus y percent was based on our own entity resolution that we did internally. But what I can tell you is that we did a lot of work that was diverse across the board because we have both businesses as well as individuals that we needed to clean up the information for. So for the businesses, we relied a lot of on the information on the bureaus. We relied on our sales team to collect the latest and greatest information to provide that into the ecosystem to make sure that we have the best information on the business name, doing business as a name, the location information, and so on and so forth. For individuals, we also relied on bureau information to provide that and tallied against that. But in addition to that, all of that data that is collected across our digital channels, whether they're the customer is interacting with our mobile app or calling into our world service, we triangulated that information to ensure that we had the best and latest information that was validated by the customers.
[00:08:56] Unknown:
Another interesting element of the problem that you were going into, as you mentioned, you had a lot of cleanup to do at the outset, and American Express is a, you know, very well established institution. So I'm sure that there are a number of different of scale that you had to work across, both in terms of the kind of geographical scale of where your customers are located, the scale of data in just in terms of the volume, the variety of different data sources and formats they had to work across. And I'm curious how you managed kind of understanding what the different magnitudes of those different axes of scale happened to be and how you worked through the kind of project planning and project implementation work of being able to encapsulate those units of complexity into achievable kind of units of work and being able to turn it from something that is gargantuan and seems unattainable to something that you are able to say, okay, we're making progress on this piece. We're going to ignore that piece over there for now. This is our deliverable, and this is how we're going to demonstrate value and be able to get useful feedback to understand, you know, what piece to attack next.
[00:10:02] Unknown:
I think we had an entire playbook on this. What we started out first with was 1 use case, and the use case was really around the pain point that maybe you have felt, which is when you are living in a particular country and you have credit card with American Express and in a relationship. So for example, let's assume that you're living in UK, you had a card with us, and for whatever reason you moved into United States, that transfer, the global transfer of card was 1 of the use cases that we first started with, which is how do we make it simple when somebody moves into a new country where they may not have established line of credit information and things like that, but we know that this person belonged to us in UK, had a great credit relationship with us, and is a valuable customer of ours. How do we make sure we take that into account as we decision someone who has no credit history in United States, for example?
And so we basically took use case approach and started by saying, what are some of these big use cases that we wanted to tackle? So that being the first 1. Second 1 being acquisition, which is how do we make acquisition of, you know, additional card products to somebody who already had a relationship with us easier, more seamless? Could I prefill the information? How do we take that into the next level and just make it seamless if you had 1 type of relationship, like a merchant relationship, and move it into a consumer relationship. So there were a lot of those use cases that we tackled, and I would say that's how we created the foundational system. So we focused on creating the foundation, but ensuring that we were able to show value along the way by unlocking some of these use cases, providing the best customer experiences across the board, which is what our customers expect us to do.
[00:11:44] Unknown:
In terms of the kind of stumbling blocks and, you know, mistakes that you made along the way and challenges that you had to address. I'm wondering if you can talk to some of the kind of inherent complexities that you had to unwind to be able to, you know, continue making forward progress and maybe some of the organizational challenges that you had to overcome where maybe you had different stakeholders that you had to convince that this was a worthy exercise or you had to understand what were the shared vocabularies that were understandable across different business units and across different geographies and just kind of building that map from the organizational aspects of the problem space to the technical requirements and the semantics that you wanted to represent in the platform so that you could have a kind of smooth conversation between the implementers and the stakeholders?
[00:12:38] Unknown:
Yeah. I think there were 2 big lessons learned for us. 1 is, you know, when you are embarking on a big transformation initiative like customer 360, you obviously have a lot of legacy ecosystem, you know, data that resides in multiple different places. And in our case, about 15 or odd applications that are in turn serving that demographic information, let's say, into different use cases. So what quickly happened with us is we realized the importance of encapsulating the changes that we're making so that we make it pain free for the end users. Otherwise, the migration from our legacy applications into our new application is a behemoth task in itself. So the words that we use were, how do we encapsulate our services? How do I encapsulate the API so that it is seamless for the end user? And they don't see the pain that they need to go through if, you know, they were getting the data from our legacy now into our POA.
So that was 1 big lesson learned, which is when the data is perforated in so many different places, you don't know what you don't know, and it's like peeling of an onion. Right? You keep peeling and keep peeling, and you find out new stuff along the way that you need to solve for. So encapsulated shouldn't really work for us. I would say the second thing that we learned is every change has a reaction to it. So first thing I talked to you about was customer 360 was created with the vision of it being use case agnostic. And what that meant is when we did entity resolution so let's assume I recreated an entire hierarchy for a business.
Anytime I'd made changes to that hierarchy, the downstream users of it, whether it's analytical users or production use cases, in fact, had potential challenges with understanding the changes that we're making and how these processes on their side need needed to be morphed. And so it was a lot about change management, communication up ahead, making sure that we were making it clear in terms of what this would mean, and then bringing the enterprise along on big changes that we were making, specifically on cleaning up of any of the entity data.
[00:14:45] Unknown:
Another interesting aspect of this project is the timing of it, where you said you started down the path of getting this started in the 2017 time frame, which in terms of the overall scope of kind of customer 3 60 efforts and customer data platform was you know, towards the beginning of it and, you know, kind of at the tail end of the Hadoop kind of revolution where you were kind of in the in between point where a lot of people were either still on the uptake of kind of the Hadoop era. Some people were moving into the cloud data warehouse era. A lot of people maybe still had physical on-site appliances for their data warehouses. And I'm wondering, kind of given the state of the ecosystem and the kind of best practices at the time, what the availability was of kind of examples that you could look to and resources that you could lean on for being able to kind of have sort of prior art to build off of and how much of it was just a matter of having to kind of forge your own path and build your own understanding about how to approach a problem of this scale and complexity?
[00:15:52] Unknown:
It's a fantastic question, and maybe I can split it up into 2 different answers for you. So the first question that I'm gonna address is in terms of customer 360, 60, and what are sort of the text challenges that we faced, and what are some of the architectural decisions that we had to make along the way to make sure that it can work? And what were some maybe some of the things that were already available on the Hadoop ecosystem that could have benefited us. So I think the challenge that we had with customer 360 in creating it is we wanted to make sure that we were focused on real time decisioning, real time entity resolution.
And that was a big shift from how things were done in the past where, you know, the information would come on a particular customer. Overnight, a batch would run, and we would magically create an entity collapsing it or, you know, disjoining it. And so that was a big decision point for us to make sure that we were in an active active environment where we did have a real time ecosystem really build up against being able to manage the size and the volume of the data that was coming through customer 360 Lens. The second question that you had was more on, you know, the Hadoop ecosystem and the big data ecosystem and what we were doing with it, and maybe I can tie it to customer 3 62.
So Hadoop plays a big role in everything that we do today. As I said, data is an asset for American Express, and and we rely on our on prem ecosystem extremely. And so for 3 60, lot of the 3 60 data actually ends up in our analytical environment, which is the Hadoop environment so that we can create reports, dashboards. We can run and train our AIML models and so on and so forth. But our Hadoop ecosystem today is also going through a big transformation. We're actually actively moving and thinking about what soon. RudderStack soon.
[00:17:47] Unknown:
RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state of the art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up for free or just get the free t shirt for being a listener of the data engineering podcast at dataengineeringpodcast.com/rudder. As far as the overall kind of platform infrastructure and data architecture that you're dealing with for this customer 360 product, I'm wondering if you can talk to some of the kind of design and implementation and architectural elements that you're building on top of and how you approached that overall kind of design question with the understanding in mind that you would need to be able to evolve the underlying technical capabilities?
[00:18:48] Unknown:
Here's what I would say to that. There's 3 things that we did. 1 is we knew we wanted to focus on real time ecosystems, so we started using NoSQL databases to create the information at speed and scale that we needed. 2, I talked about encapsulation. So we really talked about APIs that we were creating. Specifically, we used a lot of GraphQL APIs to allow us to encapsulate the data and provide that back into the end users and the customers. And 3, we made the data available in at rest ecosystem, so Hadoop ecosystem, as well as some of the portal work that we were doing to allow our sales team, for example, to understand what are some of the ways that they can interact with the data day in, day out. And then as far as the
[00:19:33] Unknown:
kind of integration challenges of being able to work with some of the legacy data systems and incorporate them into this new data product and ensure their kind of continued functionality. I'm wondering how you designed that integration path and maybe what some of the decision points were as to whether to leave those existing systems as they were and just consume from them versus when it made sense to actually use that as an opportunity to upgrade or migrate those systems into a more kind of modernized platform?
[00:20:07] Unknown:
Yeah. I think that's a great question, and I would say we approach it in 3 ways. Because as you can imagine, over 15 different ecosystems, we had variety of different ways information was passed and used and managed. So the first bucket that we thought about was, hey. There are some additional ecosystem that are also transforming at the same time, and that would make sense for them to directly connect into customer 360. So those were easier to say, hey. You're on a transformation journey. Customer 360 is in a transformation journey. Let's make sure that we are connected and you connect directly into the customer 360 API. So you get the latest and greatest information as quickly as possible and the so does customer 360 in return. So that was 1 bucket. The second bucket that we looked at was to basically say, these are the patterns that we see from legacy systems that can definitely be encapsulated where it's agnostic to the end users where and how the information flows and where it's coming from. So it was more around change management in terms of, hey. You may see something different, but this is what's happening. Not many changes are required on your end. So that was the 2nd bucket. And then the 3rd bucket of work that we had to do was with this middle ground. So what we realized is a lot of the files were being cut specifically for a mainframe ecosystem, or they were being cut specifically for a use case and transformed along the way. So those were the nuances where you had, you know, part of the data, for example, for the demographics that was being transformed before it got delivered to the end user. A great example of this is, let's say, country code. A country code, you would have a US, but maybe the file that needs to go needs to have o 1. So how do we take those types of nuanced approaches into account and really work with those teams to get them to more standardized way of using the API that we had. So that was our biggest piece of work that we wanted to make sure that we are able to do, and that's how we approached it. In your experience of working on that customer 3 60 project and, you know, working through some of those migration projects for some of the legacy systems,
[00:22:16] Unknown:
I'm Wondering if you can talk to how that informed your current efforts on migrating more of the enterprise data platform into the cloud and some of the categories of kind of products that you're leaning on. What are the pieces that you have decided you still want to kind of build in house and maintain for yourselves and some of the decision structures and architectural patterns that you're leaning on to continue the evolution of the data platforms that you're responsible for? I would say there's 2 ways we're thinking about it. 1, where we are using native technology that is already available through the cloud provider, you know, where it makes sense.
[00:22:52] Unknown:
But there are a lot of instances where we're seeing in our ecosystem, and I kinda think about the ecosystem in terms of 3 ways, which is you capture the data, you create the data, make it more organized, and then you consume the data. And in that, what we're finding out and discovering as we go through the journey is there's opportunities to streamline this information in the way that American Express needs. So for example, how do we transform the data from the real time ecosystem into the cloud to make be made available in at scale? Or data quality as a service, how do we think about the data that's coming in, applying the data quality to rules, whether it's, you know, rules that are defined by a user or having anomaly detection patterns?
These are more nuanced to how we wanna manage data for American Express in the cloud. So these are the things where we're double clicking, double downing on creating some of the custom solutions. But for most part where we can, we're using the native and leapfrogging ahead as much as we can. Another interesting
[00:23:53] Unknown:
aspect of the work that you're doing, particularly for customer 360, but also carrying that into the broader enterprise data platform is kind of data quality, data observability, and what your kind of strategy is for being able to understand how and when to validate the different information assets that you're providing to the different stakeholders in the business.
[00:24:17] Unknown:
We have been protecting and managing our data for a number of years. So we have a very robust way of how we define data stewards in terms of who's responsible for managing the data, the quality of the data. Where the teams have matured in terms of products over the last several years is really providing, for example, enterprise data quality as a service to the enterprise so that the data stewards can come in and define the preventive rules as well as detective rules that they want to make sure that they have on particular data element that they wanna monitor and govern. And they're able to take that and use information that comes out of the data quality indicators to say, okay. I have an issue here. I we need to double click on it and make sure that we, you know, address any gaps in terms of the data movement or balance and controls and things like that. So we have a lot of tools that we have matured over the last several years that help the data stewards manage the data better across the board.
[00:25:12] Unknown:
And then in terms of the kind of data stewardship aspect, I'm also curious about the governance question of how you think about balancing the kind of technical elements of when and how to lean on automation with whatever kind of regulatory and business requirements you're constrained by?
[00:25:39] Unknown:
There's 3 things that we're doing. 1 is whenever we have a new requirement that comes up from the data stewards, I serve the enterprise. And what that means is I have over 40 different teams that we have to work with to make sure that we understand the requirements, the regulatory environment, and things like that. We prioritize the framework based on what is compliance and what is must have to make sure that we protect the data for our customers, protect the brand. And then we have all of the use case agnostic capabilities that we're creating on the big data ecosystem. And in balancing those 2, we are usually able to use governance as a way to say, hey. If we automated certain things, we're able to govern our data better, therefore, lead to a better customer experience. So we're doing a lot of things not for the sake of governance, but we wanna do it because it would provide our customers with better experience, our businesses with better experience, our stakeholders across the board. So that's how we approach it. And in terms of really executing on it, we follow the agile principles.
We have sprints that we basically plan for and PIs that we plan against, and that's how we approach this problem.
[00:26:44] Unknown:
Now that you've had the customer 360 platform up and running for a while, you have a number of different stakeholders who are interacting with it. What are some of the ongoing maintenance challenges that you have had to deal with and some of the ways that the legacy systems that you rely on have kind of introduced new complexities and kind of ongoing support requirements?
[00:27:08] Unknown:
I think there's new things that we learn every day, even though the customer 3 60 platform has been in existence for about 5 years now. What I would tell you is the challenges that we encounter are things around, still, we will find some of the data that is lingering in different places or it gets you know, when somebody comes and makes an update in, let's say, a market in EMEA or Europe, what ends up happening is that information goes into legacy system before it comes into customer 360. So we're really, really making sure that the system of record, as we call it, is customer 360 for the demographic information, for the falls into customer 360 first. And then the second problem that we continually work through is in the legacy ecosystem, there were a lot of that were created to understand that, for example, the relationship of multiple entities together. And it might have been done manually year over year, but as we go into customer 3 60, we're using the power of AI and ML to create those entities through automation.
And so the change management behind the scenes underneath that has been an interesting 1 to say, if I looked at somebody who joined multiple different business entities together versus, you know, our customer 3 60 logic. What are the swap ins? What are the swap outs? How do we make sure that the use cases feel comfortable moving into this ecosystem? It is a lot of change management that we're working through on a daily basis. So those are the 2 challenges that we're continuing to work with.
[00:28:49] Unknown:
Another interesting aspect of the kind of longitudinal view of building a product like this is figuring out what what are some of the capabilities that it unlocks and how does the ability to ask and answer questions about your customers inform the future direction of how you think about working with your customers and the subsequent questions that you want to be able to answer and how those kind of new questions drive forward the scope and capabilities of customer 360 and how that feeds into the broader question of enterprise data assets?
[00:29:25] Unknown:
So I think it's a great question. And for us, you know, Customer 360 was created in 2017, but it really started to provide significant value during the COVID times. For all of us, COVID came out of the blue. It was March 2020 when we started realizing that, you know, we needed to help our customers, whether they were individuals like you and I or small businesses, merchants across the board. And customer 360 became a foundational piece for us to bring to programs around financial hardship, around, if you remember SBA, paycheck protection programs that were instituted by the government, how does American Express serve our customers in the best way possible? And that's where customer 3 60 was able to create a lot of value for our customers to be able to know the customer and decision very quickly. And as we kind of think about the future of customer 360 and where we want to go is to continue to build on the success of understanding who the customers are, getting their profile information, getting the relationships, but then making sure that it is used across the entire life cycle that they have within American Express all the way from acquisition all the way to servicing. So we wanna make sure that we do that. But more importantly, how do we contextualize some of this data and information in terms of how they're interacting with us to create that 3 60 view that goes beyond what we do today, which is the bread and butter.
[00:30:56] Unknown:
Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% reported being at or overcapacity, With 72% of data experts reporting demands on their team going up faster than they can hire, it's no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. That's where our friends at Ascend dot io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability.
Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you're a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Customer 3 60 platform and then moving into some of the cloud migration experience, what are some of the most interesting or innovative or unexpected ways you've seen those capabilities used that you're introducing to the business?
[00:32:17] Unknown:
There's 2 examples that I can give you. 1 example is the example on what I talked about a little bit earlier, which is during the COVID time, customer 360 got used in ways we did not realize or anticipate at all. And the SBA PPP loans were the 1 way that we saw. The second is for enterprise data platforms. It is being used heavily in decisioning for things like Amex offers, which is our digital product that allows our merchants to create offers for individuals like you and I, where you go spend at their location. And then we use a lot of AI, ML in the background to create and understand affinities for our customers to make sure that we offer up the most relevant offers for the individual.
So it's a a win win across the board, across customers and merchants that would not have been possible without enterprise data platforms in the past.
[00:33:07] Unknown:
As far as your experience of building those systems and evolving the capabilities for the organization and working across the entire business, I'm wondering what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:33:23] Unknown:
There's 2 lessons learned that I would highlight. 1st is if the data is not managed appropriately in an ongoing basis, it creates a lot of issues for us in the future. And when we created customer 360, taking that information of the customer demographic data from multiple different places into 1 place, We learned that in a very quick manner where we thought, oh, how hard could this be? And and not realizing that what you were getting into with managing and trying to combine records across 15 different ecosystems. And we actually created a unique way to combine these things, and we patented and created a asset out of, you know, the innovation that we have done to create a record without noise in it. Right? So that's 1. And then the second 1 is in terms of the just the data movement itself, I think there's a lot of to be learned in terms of how do we want to move the data between the real time ecosystem and the batch ecosystem or at rest ecosystem.
And there's a lot of work that we're doing across the board to make sure that, you know, where we need high compute, we're providing those resources, and we're we don't we're making sure that we're, you know, use case focused to ensure that we're taking into the right ecosystem. So there's a lot of work that is being done there, and I think the cloud is going to continue to help us transform and be the catalyst that we need to take that into the next level.
[00:34:43] Unknown:
For businesses that are thinking about embarking on a similar path of building a customer 3 60 product, what are the cases where you think that might be the wrong choice?
[00:34:54] Unknown:
Yeah. It's a great question. I think customer 360 is not for every solution. I think we also learned it within American Express that when you had a specific use case, and I will mention 1, which is let's assume that you have a use case specifically for marketing hierarchy for business entities, which is very specific to how to market a specific brand. That probably is not the use case that you wanna build customer 360 around. So I think you need to be very cognizant of what problem you're trying to solve for. And if this problem has wider use by solving it, or it's an individual use case specific problem that you're trying to solve. So that would be my advice to the listeners on when to create customer 360 and when not to.
[00:35:37] Unknown:
Now that you have the benefit of hindsight and if you were to restart this entire project over again, I'm curious if there are any sort of design or implementation or architectural approaches that you would do differently.
[00:35:52] Unknown:
Yeah. I think there's 2 things that we would do differently. 1st is we would have started this project a long time ago. But the second 1 is specifically around some of our journeys that are not necessarily real time. So for example, our sales acquisition journeys where, you know, sales agent signs the information with a particular business, how do we take that experience and make it more real time? So there are a lot of aspects of the work that we're doing that could have benefited from using more end to end experiences that warrants itself in having the information real time, be able to decision it real time, and make it available to the, for example, channels for consumption immediately.
[00:36:33] Unknown:
As you continue to build and grow the
[00:36:38] Unknown:
data platform capabilities for American Express, what are some of the things you have planned for the near to medium term, or any problem areas or projects that you're excited to explore? I think 1 of the big things for us is the consumption aspect. So I spoke a little bit earlier about, you know, the 3 ways that we think about it, which is how do I capture the data? How do I curate the data? And then how do I consume the data? And on the consumption side, we really want to focus on making sure that where it's permissible that the data is available through right tools of choice, BI tools, for example, or it's used in our AI ML modeling environment. So that is a big focus for us to rationalize multitude of BI tools that we have within the enterprise into a, what we call, tool of choice, and then making it easily available for permittable use cases. So that's a big focus area for us in addition to, you know, moving the information into the cloud.
Are there any other aspects of your work on the Customer 360 product or your work on the enterprise data platform architecture and implementation that we didn't discuss yet that you'd like to cover before we close out the show? 1 thing that I would say on customer 360 is, you know, if done right, there is a lot of power to this. All of us in the data community keep talking about data as a product, and customer 360 at American Express is a true testament to really creating data as a product and where we understand and where we manage the information on our customers.
We understand the demographic information. We have data quality rules on it, and then we have a way to, you know, detect any anomalies and address them. So this is a great way for us to really bring to life data as a product concept, and we're just at the forefront of it. And we're really excited to continue to do more of that at American Express.
[00:38:27] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. I think the biggest 1 for us is
[00:38:44] Unknown:
making AIML more accessible and more easily available, and the talent that goes around it. I think we have severe talent gap being across the industry on the lot of the data work that we're doing. So there is a lot of need for us to upscale. And at American Express, we're doing a lot of upscaling within our internal talent either through bringing external individuals in, external perspective, or giving them access to LinkedIn to, you know, raise the talent pool. But it's making sure that we have the right talent as well as focusing on how do we make sure we have AIML more available across the board.
[00:39:20] Unknown:
Well, thank you very much for taking the time today to join me and share the work that you've been doing at American Express to help build and scale their customer 360 capabilities and then leveraging some of those experiences into your work on the enterprise data platform and cloud migration. It's definitely a lot of interesting and challenging technical work and organizational work. And so I appreciate the time and energy that you put into helping to kind of drive the conversation forward at American Express and the time you've taken today to share your experiences. So thank you again for that, and I hope you enjoy the rest of your day. Thanks so much for having me.
[00:40:02] Unknown:
Thank you for listening. Don't forget to check out our other shows, podcast.init, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at data engineering pod cast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story.
And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Sponsor Messages
Guest Introduction: Purvi Shah from American Express
Overview of Customer 360 Project
Scope and Objectives of Customer 360
Data Cleanup and Entity Resolution
Use Case Approach and Project Planning
Challenges and Lessons Learned
Technical and Architectural Decisions
Platform Infrastructure and Data Architecture
Integration with Legacy Systems
Data Quality and Observability
Ongoing Maintenance and Support
Future Directions and Capabilities
Innovative Uses and Unexpected Outcomes
Lessons Learned and Advice
Future Plans and Focus Areas
Closing Remarks and Contact Information