Summary
The precursor to widespread adoption of cloud data warehouses was the creation of customer data platforms. Acting as a centralized repository of information about how your customers interact with your organization they drove a wave of analytics about how to improve products based on actual usage data. A natural outgrowth of that capability is the more recent growth of reverse ETL systems that use those analytics to feed back into the operational systems used to engage with the customer. In this episode Tejas Manohar and Rachel Bradley-Haas share the story of their own careers and experiences coinciding with these trends. They also discuss the current state of the market for these technological patterns and how to take advantage of them in your own work.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Go to dataengineeringpodcast.com/montecarlo and start trusting your data with Monte Carlo today!
- Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at dataengineeringpodcast.com/hightouch.
- Your host is Tobias Macey and today I’m interviewing Rachel Bradley-Haas and Tejas Manohar about the combination of operational analytics and the customer data platform
Interview
- Introduction
- How did you get involved in the area of data management?
- Can we start by discussing what it means to have a "customer data platform"?
- What are the challenges that organizations face in establishing a unified view of their customer interactions?
- How do the presence of multiple product lines impact the ability to understand the relationship with the customer?
- We have been building data warehouses and business intelligence systems for decades. How does the idea of a CDP differ from the approaches of those previous generations?
- A recent outgrowth of the focus on creating a CDP is the introduction of "operational analytics", which was initially termed "reverse ETL". What are your opinions on the semantics and importance of these names?
- What is the relationship between a CDP and operational analytics? (can you have one without the other?)
- How have the capabilities of operational analytics systems changed or evolved in the past couple of years?
- What new use cases or capabilities have been unlocked as a result of these changes?
- What are the opportunities over the medium to long term for operational analytics and customer data platforms?
- What are the most interesting, innovative, or unexpected ways that you have seen operational analytics and CDPs used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on operational analytics?
- When is a CDP the wrong choice?
- What other industry trends are you keeping an eye on? What do you anticipate will be the next breakout product category?
Contact Info
- Rachel
- Tejas
- @tejasmanohar on Twitter
- tejasmanohar on GitHub
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
- Big-Time Data
- Hightouch
- Segment
- Customer Data Platform
- Treasure Data
- Rudderstack
- Airflow
- DBT Cloud
- Fivetran
- Stitch
- PLG == Product Led Growth
- ABM == Account Based Marketing
- Materialize
- Transform
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster. With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform. Go to data engineering podcast.com/linode today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.
Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you're looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for reverse ETL today. Get started for free at data engineering podcast.com/hitouch.
Your host is Tobias Macy. And today, I'm interviewing Rachel Bradley Haas and Tejas Manohar about the combination of operational analytics and the customer data platform. So, Rachel, can you start by introducing yourself? Yeah. Hi. My name is Rachel Bradley Haas. I'm a cofounder of Big Time Data, which is a data consulting company that helps people build end to end data platforms
[00:01:49] Unknown:
all the way from collection to taking action on it. Tejas, you've been on the show before, but for anybody who hasn't listened to that episode, can you introduce yourself as well? Hey. I'm Tejas, 1 of the founders of a company called Hightouch that helps companies take action on top of the data in their data warehouse by moving it into systems that business teams use, like Salesforce, Marketo, Braze, or Facebook ads.
[00:02:10] Unknown:
And going back to you, Rachel, do you remember how you got involved in the area of data? I like to say I'm the laziest developer there ever was. And because of that, I try to automate everything. And the best way to automate everything is to use data to make data driven decisions.
[00:02:24] Unknown:
So, honestly, that's how I got into data, and ever since then, I've never looked back. Tejas, how about you? Yeah. So I actually got into data and the whole data space by joining Segment, January 2016. I was an early engineer at that company, and it's 1 of the leading, players in the customer data platform space. I found out about Segment by actually being a a customer of the service a couple years prior when it was just like a 5 to 10% startup, and that's how I got introduced into the whole space. Always interesting hearing people put years to certain events because, you know, looking back, it seems like some of these services are either brand new or they've been here forever.
[00:02:58] Unknown:
And it seemed, you know, 2016 segment has been sort of ubiquitous. It seems like it's been around for a long time, but 2016 seems like, you know, just a few short years ago. So Yeah. That's fair. At the intro, I mentioned that we're talking about operational analytics and the customer data platform, and those are 2 concepts that seem to go kind of hand in hand. But for people who aren't familiar with the overall idea of a customer data platform, can you just start by giving a bit of a definition about what that encompasses and some of the sort of capabilities that it entails?
[00:03:29] Unknown:
Yeah. For sure. So the idea behind a customer data platform is that it's a central database of customer information that's actionable. So I would say actionable is actually the key word there. The idea is that it's not just a database of customer information, but also database that has features that allows you to actually move that information to different systems that might be used by business teams or marketing teams around your company so that you can actually use that data to power customer experiences, whether it's affecting how a salesperson reaches out to a customer at a b to b company or affecting the actual content or targeting of a marketing campaign in a b to c company.
[00:04:04] Unknown:
So for some context on the overall space, CDP or customer data platform, it's a bit of a loaded term. Sometimes it refers to the off the shelf solutions that are called customer data platforms, like solutions like Segment or Particle or Treasure Data. And then sometimes companies just call internal systems that they build to help themselves better use customer data, their customer data platform as well. Yeah. I was just gonna say the only thing that I think is really important that I continue to remind people that I work with every single day is we collect massive amounts of data, and there's so much money that goes into how you store your data, how do you stream it, how do you do all these things. And I think it's important to remember we do that for a reason, to take action on it. So if we're not putting it in a place that people that want to be strategic, like marketing, sales, growth, CS, all those things can act on, It's basically useless, in my opinion. So it's really important that you end up investing in the CDP to be able to do things with it. A lot of the times, we see data engineers processing all this data, streamlining it, doing everything with it. And it's like, well, if you're not producing it in a way that someone can use it to make decisions, it's a waste of time and money, in my opinion.
[00:05:08] Unknown:
And on that note of being able to structure it in a way that you can make decisions based on it, what are some of the challenges and complexities that organizations and engineers encounter when they're trying to build this system and establish a unified view of their customer interactions and all of the different communications that they might have with those customers in ways that their customers are engaging with them? I think
[00:05:31] Unknown:
1 of the most important things is having, this is gonna sound like such a buzzword thing to say, an analytics engineer. And the reason I think that is because you need someone that can speak both languages. So you have people that are working with data engineers that have to understand the technology, how to build things scalably, how to not have a bunch of 1 offs, but they also have to understand how that data is gonna be used and consumed. You know, being able to understand what is an MQL, how are consumed. You know, being able to understand what is an MQL, how are people gonna do this lead enrichment, what are they doing with it, what is an outreach sequence, all those different things will impact how you model your data for performance, and you have to have someone that understands all of the specific caveats. What's an SDR versus a BDR? And all those things. And I'll tell you, not a lot of people are interested in kind of being that, I would call it, translator. And so having someone that can be a translator between technical skills and how the business uses that data is so important. So it's not only the right tools but the right people, and then kind of that whole process of how do you standardize it to make it scalable. So I don't know if I answered what's the difficult part, but it's like that's the overall strategy of how it needs to be approached. And I think
[00:06:36] Unknown:
that person that controls that integral component is the hardest person to find at any company. Yeah. I I totally agree with that. And I would say that the whole movement in analytics, engineering, and data tooling to allow you to do more with just SQL instead of having to learn all these coding skills like Python or super technical products like Airflow is really all to allow analytics and data professionals to focus more on the business problems and learning about those things than a bunch of specific technical skills. So I think this wave of of analytics engineering tooling just reduces the barrier to entry to actually solving problems like data modeling or data integration, and instead allows,
[00:07:16] Unknown:
analytics and data professionals and analytics engineers to really focus on the the tough problem, which is translating business requirements and business problems into the right technical solutions. Yeah. I completely agree with that. Actually, 1 of the things I would just say is, like, the way that tools have enabled people to just click and drag and drop and do things when getting data in and being able to just have a basic job run where you're not having to set up your own airflow even for yourself. I mean, it is helpful when you're trying to do Python stuff, but when you're just talking about setting up dbtcloud and setting up Stitch or 5chan or anything like that, you all of a sudden have these tools where you don't need to be able to make custom API scripts to, like, go and call and pull this data, and you can focus more on, like, what is the business logic you need to build in to be able to get the results you want and have it be in a transparent way that's in a Git repository somewhere so it's not hidden in views or in 1 off things. Right? Like, nothing's worse than having business logic dispersed across different systems and not understanding where things come from. So I think that's been a huge change that makes our job a little bit more scalable.
[00:08:20] Unknown:
And another interesting element of the concept of a customer data platform is the definition of what a customer might be, sort of how you think about engaging with those customers. And particularly, if you're in an organization that has multiple different product lines, like, what does it mean for them to be a customer? Is it a customer of this particular product, or is it a customer of the entire organization? And how do those different concepts and scalability complexities of understanding sort of how to segment those customers come into play when you're designing and building out these platforms.
[00:08:49] Unknown:
I wanna confirm what you mean by, you know, different customers. So I'm just gonna give an example. You can tell me if it's right or wrong in terms of what you're thinking. So, personally, our company deals with a lot of PLG growth, but also at the same time, enterprise customers that maybe signed up, you know, like, immediately went to paid and child and never did anything. I worked at Heroku for a while, and we had all the way from freemium all the way to huge enterprise customers that were also going through Salesforce first before they became Heroku customers. And it makes it really difficult because you have potentially 2, 3 different sources of truth of who's paying for a product and whether or not you consider these free users
[00:09:28] Unknown:
customers as well. Is that what you're talking about? Yeah. Exactly. Just, you know, you might have a business. The Heroku and Salesforce example is a great 1 where, you know, as you said, if they're a free user, are they a customer? If they came in through Salesforce, are they Heroku's customer, or are they Salesforce's customer? Or if they're using both, are they still the same customer, or do I have to count them separately into some of the interesting complexities that arise as a result of those interplays within the organization and across different product lines? Yeah. I mean, I think that's 1 of the things that's been really difficult. And so when I was at Heroku, we were lucky enough to have what I would consider the OG reverse ETL.
[00:10:04] Unknown:
There was Heroku Connect, which was syncing between a Heroku Postgres database and Salesforce, and we would have never been able to manage the freemium to enterprise or enterprise back to freemium motion without a lot of that automation in place. It's 1 of the reasons why I am so passionate about reverse ETL because I saw the power of it very early on. So back in 2016, Wintages was at Segment. And 1 of the things that's really powerful about it is you allow building how you want to surface that in a sales tool like Salesforce. And so because you have this ability to make these complex decisions in a separate tool, and then mirror it into a system, in a more standardized way, that's how we were able to handle those different things. You can choose when and how you decide what a customer is. Do they have to have important action in the product, even though they're a free customer, to be considered a customer worthy of being in Salesforce and things like that. So that's really how we've managed it, but I'm sure Tejas has a different approach as well. Yeah. I would largely agree with that. And I think the warehouse is actually the best place to
[00:11:17] Unknown:
answer some of these questions, like what a customer is, who a customer is, what is a customer across different platforms and different communication channels and different data sources. A lot of companies are looking for kind of a silver bullet when it comes to identity resolution for customers or entity resolution or building a single view of the customer. But in reality, I find that most companies kind of outgrow these generic solutions very quickly and need to build their own SQL queries and sort of formulas to establish what a customer is inside of their data warehouse. And a data warehouse that allows you to, you know, query data any way you want with the power of SQL is really the only solution that's flexible enough to adapt the needs of companies. The other thing I would mention is that it's not just about having multiple product lines as well in the case of a company like Heroku that's now owned by Salesforce. But even if you have, you know, different data sources flowing into your warehouse, like data from an analytics system like Segment, and then data coming in from an ad system reporting on your ad performance, like Facebook ads or data coming in from a webinar system You might still need to do some basic identity resolution to merge the data between all of these different systems and a data warehouse where you can join in SQL and build your own queries and transformations is really, the place that allows you to iterate on this definition of what a customer is over time and freely as it as it sees fit to your business. So I think a lot of companies are looking for a silver bullet here when there actually really is not a silver bullet. What you wanna opt for instead is is the flexibility to be able to iterate freely and continuously on the definition of what a customer is. Digging more into the sort of technical and operational aspects of the customer data platform, you mentioned data warehouses a few times. And the introduction of cloud data warehouses has definitely
[00:13:02] Unknown:
brought in a new wave of interest in how to use these systems and, you know, business intelligence and data warehouses, things that we've been using for decades. And I'm wondering if you can just talk to some of the ways that a customer data platform is distinct or disjoint from just having a data warehouse and a BI dashboard to be able to understand sort of what are the interactions with your business, you know, across your customers?
[00:13:26] Unknown:
It's an interesting question. I think CDPs are, like, conflated terms that it's hard to to answer generically for how all of us are thinking about it. But, really, what I kind of saw at Segment was that CDPs and marketing tech solutions were actually some of the earliest companies to adopt, some of this cloud data warehouse technology like Snowflake and BigQuery. Actually, a lot of Snowflake's early customers were advertising tech and marketing tech companies. At Segment, we were heavily using BigQuery before a lot of our customers had adopted BigQuery to power a lot of our CDP features on the back end that allowed marketers to slice and dice data, move it to different systems, build an identity of a customer, etcetera. Something that's been interesting is, originally, these day cloud data warehouse solutions and the most modern data warehousing technology that we use today was often used by these marketing tech and data vendors inside of their own kind of proprietary products. But what's happened over the last 5 to 6 years is that every company has wanted to invest in data analytics and data engineering and data warehousing and BI internally, and every company is building their own data warehouse that actually represents a source of truth information across all different data sources of a business.
So, originally, we didn't even have data in a central place, so they had to first look towards solutions in the market like CDPs that helped you both collect data, transform it, manage it, and then sync it to other places. Now if you look at most companies, companies already have a data warehouse as well as tooling that helps you get data into it, build models inside of it, report on it in BI. And the real last problem that people are trying to solve when it when it comes to customer data platform is how do you activate that that data, or how do you use it for marketing, for sales, and for different operations of your business? So I think, like, if we were to build reverse ETL and operational analytics and and high touch 5 or 6 years ago, it would have technically worked, but not enough companies would have had the prerequisites, like having all their data in a data warehouse and having clean data models in a data warehouse for it to be useful for them. But if we fast forward, CDPs didn't really grow as fast as other technologies in the whole space, like Snowflake, like BigQuery, like DBT. And what we're seeing is plenty of demand for customers to kind of turn their data warehouse into a live customer data platform that not only influences analytics, but also influences the operations around a business.
[00:15:50] Unknown:
From my opinion, I think of the data warehouse as being, like, the base foundation if you do it right for a CDP. Right? So it's like you can turn your warehouse into a CDP if you have the right tools, but, like, that's why I've always called it a data warehouse and not CDP because I think it can sometimes get confused with a lot of off the shelf things, which I feel only do 80% of what you need them to do. So being able to do something in house where you combine different tools and get a 100% of what you need, realistically, 95%, but we'll just say a 100%. That's kind of been my opinion on why I call it a data warehouse versus a CDP. So
[00:16:28] Unknown:
You mentioned the sort of activation of the data and, you know, we've mentioned the term reverse ETL and operational analytics a few times. And this is a trend that seems to be going hand in hand with the growth of cloud data warehouses and the focus on using them for customer data platforms. And I'm wondering if you can talk to some of the semantics between the initial term of reverse CTL and the now more widely used term of operational analytics and some of the ways that that sort of evolution of terms reflects the evolution of the ecosystem and the ways that it's being used and sort of what you think is the relative importance of reverse ETL versus operational analytics and its relation to this idea of the customer data platform?
[00:17:12] Unknown:
Yeah. This is definitely a tough 1. I think at High Touch, we've been monitoring all the terms pretty closely to see which one's customers are using more. So I have a little bit of a quantitative answer here, but reverse ETL has been growing a lot faster than operational analytics when it comes to what people are searching on Google and stuff like that. Operational analytics does have, I think, more or an equivalent amount of, like, searches per month, for example, which I think is a pretty good indicator for what term is picking up. But the the reason it it has a lot of searches per month is that it already had a lot of searches per month before companies were using the term in the context of reverse ETL or data warehouses because operational analytics also means a lot of other things like analytics on your business' operation. So, personally, I'm not a huge fan of the term operational analytics. I think it's just like customer data platform. It's a bit too generic and confusing for some customers. For example, I I have a friend who's a former customer of ours. Ed Cloudby now works at FanDuel, and and he's the manager of operational analytics there. When I was chatting with him, it turned out that that was just analytics on FanDuel's operations and had nothing to do with operational analytics and the and the way we use it at Hithetch. So I would say the distinction is really that reverse ETL is a is a specific technical process of of moving data from the data warehouse into these business tools, and it's like a very specific way to solve the problem of making data self-service or of of allowing companies to activate their data. And then operational analytics is more just the general idea of putting your analytics to work and using all the work you've done in analytics also for the live operations of your business.
[00:18:48] Unknown:
But personally, I'm much more a fan of terms like data activation or activating data or operationalizing data than operational analytics just because I think it's can be confused with other things. I completely agree. I think the reason why I don't love reverse ETL is because it's just so much more the way I view it. If we were to put a price on things right? So, like, say you're extracting, doing typical ETL and you're bringing a bunch, like, billions of records into your warehouse, but granularly, none of those are really that important. You have to model them, understand how they relate to your overall customer journey, which customers they are, are they in your CRM system, do we care about them at all, And then this very, very high value result is what's being sent somewhere else. So it's like, oh, come on. We gotta give, like, this reverse ETL more power. Like, we can't just be like, it's the exact opposite of ETL because it's like, no. You're sending these high value data points to different tools that someone's going to act on and do something with versus, like, ETL. It's like, I'm just gonna hit an API point and get data in. So it's like, I don't love reverse ETL, but I agree it's kind of confusing to call it operational analytics because there are a bunch of people that really do analytics on operations. So it's like, what does that mean? But I think 1 of the biggest things that's been really interesting is there are different people that own different components of the business. We'll just take Salesforce for example.
I don't want to have to update my code every single time someone wants to change a process in Salesforce. Sales owns their own process. They own what they wanna do with the lead, when does it MQL, all those things. I don't wanna have to be changing code on my end. So what's important is I say, here's the valuable data that you can build a process off of and take action on. You own the definition of how you wanna take action on it, but I can surface you the source of truth of these customers. And so I think it's really important that what reverse ETL tools do is allow people that know their area of the business to act on a single source of truth in a scalable way. And so that's why I think it deserves more than reverse ETL, but I don't love operational analytics.
[00:20:50] Unknown:
Yeah. I think terms are tough. We're trying to push Yeah. Yeah.
[00:20:54] Unknown:
I'm not gonna lie. I had to Google some of the terms on here because I was like, is it what I think it means? I was just like, oh. Like, I know this space and I can speak to it, but, you know, when people coin terms, you're just like, am I thinking the same thing as what they're thinking? So, like, with CDP, I always think segment. And to say that segment is this all powerful thing that's gonna fuel these things is like, no. Segment data bringing it in and modeling it with a bunch of other stuff is the way I view, like, the evolution of the data platform. And so when people say CDP, I'm like, I don't want just a CDP that doesn't solve all my issues, but then the CDP we're talking about here is a combination of all these tools and being able to act on customer data. Yeah. I think what's tough is when some of these vendors get sufficiently large, they pick a generic term that can encompass
[00:21:39] Unknown:
all the product surface area that they'd ever want to build, like customer data platform. I can't imagine a term that's more generic than that. I I don't know. What about CRM,
[00:21:48] Unknown:
customer relationship management? Fair enough. Fair enough. I can't tell which one's more generic, honestly. That's how you know when you've made it. It's like you just own the most generic term and everyone thinks of you. Yeah. Maybe the next evolution will just be customer platforms.
[00:22:02] Unknown:
Yep. Yeah. Another gripe with the idea of reverse ETL as a term came up in a conversation I was having recently, which is that ETL as a discipline has no implied directionality.
[00:22:15] Unknown:
So, you know, calling it reverse ETL is kind of pointless because there there was no direction for it to be pointing in in the first place. So Yeah. I think the only thing I really like about the term reverse ETL is despite it maybe not making sense, it does immediately click for a lot of customers, and a lot of customers just actually think of it that way. Like, I remember, originally, we didn't want to adopt the term as a vendor. It was something that we heard in communities. It was something that customers would would say to us, but we didn't really wanna adopt it because it it felt kind of lame and too specific. But then we realized that it's rare to be able to start a company and within a year or 2, have a term that can be widely linked to your company. So we decided to just go with it. I mean, customers were literally asking us in the call, so it's like the reverse of ETL or the reverse of Fivetran or the reverse of Stitch.
So it's just
[00:23:05] Unknown:
inevitable that we had to adopt the term. Yeah. It's interesting because I've actually heard people say now that Stitch and Fivetran shouldn't be ETL. They should just be EL, and then DBT is the t. But I do think in your specific example, you do have a t because you're taking this raw data and transforming in a way that needs to be consumed. I honestly think you all have the harder job than Stitch and Fivetran because you have to deal with all the errors that come back the other way or changing, you know, models or whatnot that you have to deal with. So it's like you all do reverse ETL because you have to extract it and then transform it and load it, versus, like, the Stitch and 5 Trans are actually just doing the E and L.
[00:23:45] Unknown:
Yeah. And in my opinion, the other really hard part about reverse ETL that's not really conveyed in such a technical name is that you're really not just building a platform to move data points around, but a a platform that's kind of cross functional that allows, you know, data teams, like technical folks to also collaborate with other folks in the business, whether it's sales ops or marketing ops or marketers directly. And that's really, a tough design problem. I mean, something that we've thought super deeply about at Hyatt, that's, like, kind of creating parts of the app where you manage data models that have technical features like integrations with DBT and stuff like that, and then having separate parts of the app that can be consumed by, you know, people who are used to managing Salesforce or Facebook ads, etcetera.
It's really understanding both personas. That's a challenge versus if you look at a product like Fivetran or such, I think, sometimes they're even calling themselves just data replication instead of ETL as well. It's it's really just replicating data points into the warehouse. The technical user can do whatever they want with it, but there's not a huge design problem in, like, how you express the business workflows.
[00:24:48] Unknown:
Yeah. I'll just add on that. I'm not gonna name names, but I actually know 2 of your competitors. And 1 went really far the technical route of the UI, and 1 went way more the operational analytics way. And it's just like 1 was way too technical for me to use, and 1 was way too bland for me to use. And so it's like I think it's a happy medium when you can support both personas because you really do have to support and feel like data engineers feel comfortable connecting their CDP warehouse, whatever you wanna call it, and then letting a marketing person access it and have some freedom. It's a little bit nerve racking because at the end of the day, if a mass email goes out, it's not gonna be the marketing person that got blamed for the wrong data points.
[00:25:30] Unknown:
That's true. Yeah. In my opinion, I think a lot of the innovation in the space will come not just from kind of making it easier to build a lot of integrations, but also how in in terms of product features that make it really easy to hand off between the data team and the business team. So I think that collaboration layer is really the biggest area of opportunity for any company in reverse ETL, operational analytics, or customer data pod. Absolutely.
[00:25:54] Unknown:
And the whole space of reverse ETL or whatever term we decide to cement on as the time goes by. It's only about a year or 2 old in terms of sort of as a product category. I mean, people have been doing it forever, I'm sure. But as a distinct product category, it's relatively recent. And I'm wondering what have been some of the changes in terms of the focus for you at Hightouch, for the industry, and for end users of the product as far as who are the target users, how are they using it, how are they communicating about it within the organization, and sort of what are some of the interesting evolutions that have happened over those past year or 2? I think
[00:26:36] Unknown:
1 thing on our end is it's become a lot easier to our services at our platform at Hyatt. It's because customers now come to us with knowledge of what we're doing beforehand, which is a really new experience compared to, like, let's say, a year or a year and a half ago when we had to explain everything from scratch in terms of our whole approach when we talked to customers because reverse ETL was just, as you mentioned, not a concept that people were familiar with. 1 of the the big challenges that's rather obvious when it comes to building a reverse ETL platform is, initially, when we started building the platform, we had a few key use cases that platform is initially when we started building the platform, we had a few key use cases in mind. You know, we'd help enrich systems like Salesforce for b to b companies, and we'd help enrich marketing and advertising platforms for b to c companies. But what we found in serving the data teams is data teams are so cross functional and in ways that we really didn't imagine, honestly. We've started building integrations to finance systems like NetSuite. We have requests to build integrations with systems like Anaplan and SAP.
We've built integrations with systems like Zendesk or Intercom on the support side, and and even things all the way down to just, like, everyday business workflow tools like Slack and Asana. I really think it'll be interesting to see how this space plays out and if there's certain companies that focus on doing reverse ETL for certain types of business workflows that take off kind of vertical specific operational analytics or reverse ETL providers. At Itouch, we're going broad, and we're really focusing on enabling the analytics engineer or the person who thinks both about business problems as well as about SQL and data modeling, etcetera, to deeply solve and address the needs of their business users no matter what team those business users are on. So we're addressing sales things, marketing things, finance things, customer success, and kind of all across the board. But it'll be interesting to see how that plays out. I personally think the biggest area of opportunity is in what's really gonna be necessary to solve this problem throughout the ecosystem is to have, like, a really good interface for the business users to be able to collaborate with the data users all through the reverse ETL platforms. So it's a complex problem, and it's not 1 that'll be solved in 1 year, but we've started chipping away at it at that high touch by building certain vertical applications on top of the reverse ETL platform. For example, our audiences product, which kind of allows marketers to come into the hype that you happen, select different subsets of customers to sync to their marketing tools on top of data models that are predefined by the data team. And I think a lot of the innovation we'll see in the VeracyTail platform will just be more and more of these vertical applications that allow business users to safely and effectively get their hands closer to the data layer without having to build definitions or SQL queries from scratch.
[00:29:16] Unknown:
When you're talking about being able to integrate with things like Asana or Trello, it, you know, brought to mind things like Zapier, which is a tool that these teams would have used, know, maybe a couple of years ago to be able to link together all their different workflows. And I'm wondering what you have seen in organizations that you're working with as far as the kind of relative popularity of these point to point evented workflows versus the hub and spoke model that the sort of high touch and reverse ETL platforms enable?
[00:29:44] Unknown:
Yeah. So, honestly, we still see a lot of that. Actually, a lot of our customers end up starting with kind of an event based point integration platform like a Zapier or Workado or Trade dot io. We kind of see all of these systems pretty frequently at. And then when they realize they need to operate on a more full view of the customer in order to build the workflow they're actually looking to build, that's when they realize they wanna tap into the data warehouse. And sometime along that chain, they discover that there are reverse ETL platforms like Hightouch that allow them to more easily do that. So I think that's a huge part of the market and a huge part of the story. We see a lot of our customers using those systems, and oftentimes we think of HiTouch as Zapier for customer data. And because it's for customer data, naturally we're built on top of the data warehouses. I really see that evolving as the source of truth for customer data across all types of companies and across all maturity of companies. Not to say there's not good use cases for tools like Zapier as well. There are types of data integration problems or business workflow problems that just don't make sense or don't have a clear advantage to go through the data warehouse. Like, we do things like calendar schedules on Calendly into our Slack via via Zapier, and I see no reason to do that sort of thing via high touch. It doesn't relate to a bunch of data across a bunch of different systems or kind of source of truth models. It's just plugging few things together or at Segment, we plugged Jira into Google Sheets to do some more planning, and there's really no reason to do that in a data warehouse either. But when it comes to workflows that are really thinking about customer data and your source of truth for customer data, I think those will end up gravitating towards the warehouse.
[00:31:21] Unknown:
The other interesting thing is the sort of link between having all of this customer information in your data warehouse and this, you know, apocryphal CDP and the rise of the sort of reverse ETL category. And I'm wondering what you see as the viability
[00:31:37] Unknown:
of, you know, 1 existing without the other, whether it's using reverse ETL without having the CDP, just being able to, you know, maybe aggregate across multiple systems instead of having it all in 1 place or using a CDP without then having the reverse CTL to populate information back out into the systems that you harvested it from? I can see a world in which a CDP could exist without reverse ETL because, obviously, that has happened. I don't think it would be a very useful CDP. It'd be a lot of manual processes, and you'd basically just be building dash boards on top of it in a visualization tool, then downloading it and then manually loading it somewhere else. Or you would have a lot of 1 off scripting, which is kinda how it was before, where you're making random API calls to Salesforce, getting errors back, not knowing how to handle a lot of different things. I would say, in its own way, it's kind of like how Snowflake makes it so you don't really need a DDA anymore. It's like all these errors that you have to deal with and all the APIs you have to custom learn to be able to write to these 3rd party tools, High Touch does for you and Reverse Etail does for you. So it's like you're able to kind of act on it a lot quicker and you need less headcount, in my opinion.
But I still think it's possible to have a CDP without reverse ETL. Like I said, I just don't think you'd be getting your money's worth. In terms of a reverse ETL without CDP, I don't know where they'd be pulling their data from, maybe Google Sheets, but I know that that is something that's a source. But I don't think, once again, you'd be getting a lot of value out of the reverse ETL tool if you're not centralizing everything in 1 place. 1 thing I would say is I really love the new thing around how everyone's saying, oh, warehouse first approach. Right? And so I think the warehouse first approach is really we're kind of considering the CDP here, where it's like get everything into your warehouse, centralize it, then send those valuable events.
What is difficult is if you basically are sending the same data to a bunch of different sources using reverse ETL but you never have it centralized, then you're gonna have data flowing into Salesforce from 5 different places, maybe Zendesk, Jira, product data, something else. Right? And then you're gonna clog these individual systems with irrelevant data and having to replicate the same business logic in each of these different tools. And, inevitably, your copy and pasting is gonna break. You're gonna have different data, different places. So
[00:33:53] Unknown:
I think either one's a little bit janky if you do 1 without the other. Yeah. I think the market has really been saying that, basically, the data warehouses, the new CDP. I mean, the toughest problem in CDP is centralizing all your data in 1 place. At Segment, the leading, like, off the shelf kind of CDP providers, and at Segment, we would always say, you know, in the future, all companies are going to be having a first class feature to send data into Segment. If we look at where the market is now, that didn't really happen. Instead, what's really happened is that every company is either publishing or kind of sponsoring some sort of first class way to get their data into a data warehouse instead. Peray is, for example, has a native connection to data warehouses.
Fivetran, for example, has hundreds of connectors to replicate data from different source systems into data warehouses. And it's become a lot easier to centralize data in a data warehouse than it is to centralize data in any other sort of proprietary platform. And I think that alone is the core reason why the data warehouse will become the customer data platform. And really what's missing from that stack is a good standard way to take action and activate data, and and that's what we aim to be at PyTorch with reverse ETL as, like, 1 of the core underlying processes for how we do that.
[00:35:10] Unknown:
Struggling with broken pipelines, stale dashboards, missing data? If this resonates with you, you're not alone. Data engineers struggling with unreliable data need to look no further than Monte Carlo, the world's 1st end to end, fully automated data observability platform. In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem with broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL and business intelligence reducing the time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today. Visit dataengineeringpodcast.com/impact today to save your spot at Impact, the Data Observability Summit, a half day virtual event featuring the 1st US chief data scientist, the founder of the data mesh, the creator of Apache Airflow and more data pioneers spearheading some of the biggest movements in data.
The first 50 people who RSVP will be entered to win an Oculus Quest 2. Over the sort of medium to long term, I'm wondering what you see as the opportunities and new capabilities that people will be asking for for this space of reverse ETL and customer data platforms to be able to more effectively close the loop of information and interactions with customers and maybe some of the additional sort of industries or verticals that this pattern can be applied to? 1 of the things that I think it's missing
[00:36:45] Unknown:
in general, the entire data platform, is a lot stronger orchestration in terms of when data's coming in, when it's being modeled, when it's being sent out. There are some very, I guess, basic things that you can do when you're using, you know gosh. I'm blanking on it. Airflow. But I think in general, understanding this is when your data's coming in. You have the most up to date you will ever have of this customer's data, and then this is how it should be modeled. All of your dependencies, which is great. Like, once you kick off, you know, a dbt job or something like that, it's really great because the dependencies run 1 after the other. And then some of the reverse detail tools have the ability to say, okay. I'm gonna kick off this job and then send this data other places. But there's a whole feedback loop of, like, when you send it to a third party tool, when it's acted on by a customer, when it sinks back into your data warehouse. So it's this whole cycle of data, of sending, interacting with a customer, and receiving feedback. So I think a way to strongly relate all these feedback loops and streamline them I don't necessarily think that something like full on streaming is necessary in this industry right now. And I'm only gonna say that because until every single tool does streaming, you're only as fast as your slowest tool And guarantee, not everyone's gonna be streaming in the next 10 years. So I think 1 of the biggest things is that you just need to fine tune how quickly, how often you can get that data in for the right price and making sure that you're not being wasteful. Right? So you don't need to be streaming or every 5 minutes certain yearly financial data. But that whole data orchestration of making it move as quickly as possible allows
[00:38:19] Unknown:
you to reach out to your customers, make a quicker impact, and so on. That totally makes sense, and that's something we're thinking pretty deeply about. And another area which I think is a really big opportunity for CTO platforms and just for every platform in the space is right now a lot of the platforms have features to do alerts and tools like, you know, Datadog or PagerDuty when a sync fails, and that's kind of table stakes in my opinion. But to really take it a step further, I think there needs to be a really good workflow around preempting failures. So, you know, kind of alerting the user that, hey, this sync is probably gonna fail because of some misconfiguration or something that's something that you're defining in sync as invalid with a way the destination system works or giving technical users or even semi technical users a good interface to write tests or assertions for the syncs or kind of test individual rows or a few batches of rows before running the whole sync across your millions of records in your data warehouse. So we built some interfaces for this sort of stuff in, like, test row and stuff like that. But I think there's a ton of opportunity here, and I think the ecosystem and all the vendors are kind of far from figuring out the ideal solution here, but it's a really large problem. I mean, as you make it more accessible to do large data transfers and large data integration work, you also need to make it safer so that as people who aren't used to writing scripts and tests to to do all these things start doing performing these processes, they don't mess anything up as well. As Rachel mentioned earlier, when the marketing email goes out to the wrong people, it's not gonna be the marketer's fault. It'll be the data person's fault, and I think there's a lot of innovation to be done in the whole reverse CTLA tooling to really prevent issues like that. I actually think that brings up a really good point. Actually, 2 good points. 1 is audit logs. Right? So I think 1 thing that's really great is that you can kind of see where data is being sent, when and why. And
[00:40:10] Unknown:
not to say that people point fingers, but it's really nice to know that you have protection to be able to say, this is why this data point was sent. This is underlying business logic. We're not making this up. We didn't randomly decide to send the wrong data somewhere. Right? So, like, being able to I will tell you, creating an end to end staging environment for the new complexity of, like, CDP end to end data platform is so much work, and it's almost a full time job keeping it up and running. And then it's like even creating a staging environment you need a staging environment for your staging environment to make sure you don't screw up your staging environment. And it's like, how do you get all this data coming in and having to keep up to date without paying for basically 2 entire data platforms, but know that when you're running it, it's the right data to be testing it. So then you got, like, Salesforce sandbox coming in, then you have to have a separate staging environment in your Snowflake, then you have to have dbt running a staging environment, then you have to have it sync to a sandbox, and then when it's ready to go live, it's so much work to push it to production. Right? Like, that's so much work, but it's so important when you're engaging with your customers. So having something that could potentially streamline the whole end to end staging to production for this, like, very complex system would be worth a lot of money. So anybody listening, go and start that company. I'd be happy to to use it. Yeah. I agree. I think the way that I think about it is, like,
[00:41:41] Unknown:
like us, for example, should be thinking about what would the absolute, you know, best in class engineering team that has all the time in the world build in terms of both tooling and processes if they wanted to build a great sync of data between the data warehouse and x y z destination systems around a company. Things like, how do you test rows? How do you alert when there's failures? How do you start us all working? How do you run tests before you make a change to the pipeline? How do you think about staging in production? And then the tooling should provide a really good way to do all of this stuff without thinking about every single detail with ideally just SQL and some settings. And that's the biggest design problem that exists, but the technical design problem and just the user experience, user interface to put design problem. That's 1 of the reasons I think that reverse ETL is and operational analytics is actually just so much harder than data replication and normal ETL because the end to end requirements in in ETL platform are pretty simple. You just drop the data in the warehouse, and now you don't even have to transform it because people can go transform it with DBT. When it comes to reverse ETL, there's just so much more that can go wrong and so much more potential on what can go right as well. On the topic of sort of data quality
[00:42:52] Unknown:
and the other big trend that's happening in the past couple of years of sort of data lineage tracking and open metadata and being able to propagate this across all these different systems where in, you know, business intelligence dashboards, they're starting to have data quality indicators so that when you view a chart, you can see this is when the data was last updated. You know, maybe the data quality check upstream failed, so you should take this chart with a grain of salt. I'm wondering what you're seeing as the potential or any activity that's happening in some of these operational systems, whether it's HubSpot or Zendesk or Salesforce, to be able to expose those quality indicators now that you are feeding all of this information from an automated platform to be able to say, you know, this sync was only partially completed, so, you know, this record might not be fully up to date or something like that, and then being able to track that back into the system, like, Hightouch that is actually performing these replications.
[00:43:47] Unknown:
Yeah. I think that's a great idea and something I've been kind of noodling on. Don't know the exact right solution, but it would be amazing if you could go into a platform like Salesforce and see when the last time, you know, this deal was actually updated, what the definition of it is with, like, a link out to a platform like Hightouch or your metadata system that can actually tell you that, whether the value is not up to date due to some upstream data pipeline failure. And I think all the kind of prerequisite steps to being able to expose this metadata are in the works. You know, DBT is trying to track dependencies of DBT models outside of the scope.
The DBT project with, you know, their exposures feature, there's a lot of features they're on that you can see online, like the metadata tiles that can tell you whether a certain model or transformation step has failed, and then, you know, Hithetch can know whether, thus, the data in Salesforce is is not up to date yet because of something upstream. I think there's a ton of opportunity there to both track and consolidate that information and and lineage of data, but to also figure out the best way to expose it to the business users without being overwhelming. Just exposing it when it's actually relevant, if that makes sense. And so in your experience
[00:44:59] Unknown:
of working with customers and building these reverse ETL reverse ETL capabilities or the CDP or the end to end reverse CTL capabilities or the CDP or the end to end integration?
[00:45:15] Unknown:
I can give 1 of the bigger ones that I actually ended up helping High Touch implement this internally too indirectly. But, basically, we had a client that had a PLG motion. Right? And so when I say that, you know, they have users in their products that are either free or paying, They're using things, and they needed to understand how do I get them into the hands of sales in a scalable way without paying a fortune for Salesforce storage or whatnot. Right? So you you have all these individual users. You don't know whether or not they're valuable yet, but how do I surface them in certain ways? Not only that, but now you have users that belong to a team that are connected to a Stripe account, and you're just like, okay. How am I gonna get this all in a scalable way into Salesforce for people to act on it, to be paid, comped, all those things correctly? So we were able to build basically our version of what we've said is a PLG supported ABM system using Salesforce. So we think about it in the way of, like, what is a purchasing entity?
In the past, before you had a lot of SaaS, everything was at an account level. And now you're dealing with things at individual team levels, purchasing entity levels, and all of that. So think about it as a Stripe customer. So what we need to do is be able to say these Stripe customers or these purchasing entities belong underneath these accounts based off of account domains or regions or whatnot, and these are the users that belong to those things. So you basically are able to use this data coming in from their product, call it like Mongo. You're able to model it, say, here's the Teams. Here's the users. Here's their Stripe data or, you know, whatever billing Chargebee data. Here's what we know about them. Here's when we want to create an account in Salesforce based off of their owner domain or their billing domain, or is there already an account in there? You can go and then create purchasing entities, which we'll call it an organization or a team, and then you're able to relate production product data to something in Salesforce. We're able to connect everything and make this reverse ETL process really easy because what you're gonna have is, at a product level, you're gonna have things at the purchasing instance level, and now you're able to replicate that in Salesforce, have those specific instances assigned to a sales team, have marketing be able to market to the users of those paid systems, all of this stuff. In the past, you would basically just have to attach users to an account and have this complexity of, like, 20 different hierarchies in Salesforce in Salesforce in a scalable way. And so it's been really fun. We've actually implemented that a couple different places, and people are amazed. And it's kinda like, dude, this is what we've been wanting to do for so long, and now it's a lot easier because of reverse ETL. So now you're not bound by, you know, what does Salesforce natively support? You're able to say, I'm gonna build my own architecture of a data model in Salesforce.
[00:48:04] Unknown:
So I think those are super powerful examples of all, but I think something exciting for me is they're really unique examples that we don't see too often. Like, for example, people not just moving data between 2 different systems, but also creating some sort of business workflow using reverse ETL. An example is 1 of our customers, Blend, and we've written about this and stuff like that in the past, but they're kind of a mortgage loan platform company. They recently went public, and they do push a bunch of data into Salesforce with what they know about customers from their data warehouse using Hidash, but then they also do things like create Asana tickets automatically for their customer success team to go look into some unusual product usage information, product usage data that they're seeing in their data warehouse defined by a SQL query, or they create Slack alerts saying to go try out these features to their customers and to different folks around their organization when they notice certain patterns that indicate that customer might be ready to do that based on the data in their data warehouse. And I think some of these business workflow examples are pretty cool in that most people wouldn't typically think of using their data warehouse for these purposes, and they almost start to kind of chip away at BI use cases in a way where previously people might be opening a BI dashboard and refreshing it to alert themselves on some of these use cases or having a BI tool send them the whole report every single day, and then they click into it and and see what's changed to actually being able to power these deep business workflow type use cases entirely with Hytush. And I'm seeing more and more demand in the market for things like this. 1 of our customers just messaged me yesterday and was like, we're looking for a communication alerting platform based on the data warehouse. And I was like, how have you seen our Slack integration?
And then they were like, wow. I didn't know you could build business workflows like this. Obviously, she's like, Hi, Thech. I was trying to look all around, and no BI tool has a good Slack integration. So personally, I'm really excited about those types of use cases as well because I think they don't seem as sophisticated. They seem more like a Zapier type use case, but I think they're super, super powerful and really expand the accessibility of data throughout an organization.
[00:50:07] Unknown:
Oh, absolutely. I think it's 1 of the coolest things seeing, like, when we onboard customers to, like, high touch, and we do some basic integration, like, let's connect Salesforce. Oh, it's magical. We send ARR to your account level. Woo hoo. Like, you've been trying to do that for 5 years. And then all of a sudden they say things like, oh, I wish I could get a Slack alert when the ARR drops on certain accounts day over day. I'm like, well, you can do that. How? Oh, high touch. It's really interesting to see them kind of evolve their understanding of what they can do with the data in the warehouse because a lot of people are, I guess, data illiterate in terms of how much you can use it and where you can use it and all those things. So I think, once again, reverse ETL makes things, like data, more actionable
[00:50:48] Unknown:
in a scalable way. I think another element to that as well is that business users have been conditioned over the past several years about what's possible with data because, you know, they might ask for those kinds of things, but the answer is, yeah, that'll take about 6 months and a $1, 000, 000 to build. And then the next request is another 6 months and another $1, 000, 000 versus, you know, with the level of sophistication that we've built up in these systems, it becomes much more feasible. And so they're, you know, shocked by the capabilities that they're being given because they're so used to the fact that the answer is going to be, yes. That's possible. But
[00:51:24] Unknown:
Absolutely. I think it has to do with the evolution of tools and then the initial foundation that's built by an analytics or data engineering team. So when you already have those base models in something like DBT where you have this basic definition of all these different user attributes and different things, once again, it's so easy to just be like, well, we have that. Let's reuse the data we already have and do automation off of it. Let's not reinvent the wheel each time. And so the fact that these tools have evolved to support the scalable data modeling system and reuse all of that logic, it makes it so easy. It's funny to get someone on a call and say, okay. What data do you need? Okay. Click a few buttons to get that data in. I'm gonna write 1 minor script, and then I'm gonna click a few more buttons, connect to new destination in high touch. There you go. In an hour, you have all the data flowing through the way you want it to. And they thought, oh my gosh. I thought this was gonna take me a year of backlog to do it. I'm like, no. You You just have to be friends with the right person.
[00:52:20] Unknown:
In your own experience of building and managing customer data platforms and working with reverse ETL and activating the information that's in the warehouse for these different business use cases, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process? I honestly think still the biggest challenge that exists is
[00:52:39] Unknown:
developing the right data models in the data warehouse. Personally, I think if that's done well, that solves 90 percent of the pains throughout the organization. It helps the analytics. It makes the job of Hyatt that's a lot easier. But really something that's still very hard and still very business specific is data modeling and really understanding all the different data systems that you're you're syncing into your data warehouse, how they relate to each other, how they should be joined together, how the definition should be created on top of them, and then how you should iterate on those over time. And I think when I think about issues that our customers face, I would say 90% of them kinda flow back to you to that. Not building good data models in in the source of truth or in the data warehouse, I think it's a huge challenge. I'm not sure if it's a technology challenge or just say process, technique, understanding, not knowledge type challenge. But the whole trend that I'm really excited about is analytics engineers shouldn't need to develop all these, you know, super specific technical skills like coding and airflow and Python and etcetera, etcetera, etcetera, just to get their hands on data and and build pipelines on top of them. And I think if we, as vendors, can focus on making the tooling really easy to use and and just require SQL, then analytics engineers and data analysts, data engineers can spend more and more of their time really understanding the business requirements and how to best do data modeling. But, yeah, the biggest challenge that I see really, really always comes back to not data integration, but the data modeling itself and getting that right in the source. Yeah. I agree with that. I'll add 1 point, though.
[00:54:11] Unknown:
I think a lot of it has to do with understanding we'll call it your internal end customer, whether it be marketing or sales or whatnot, in understanding what problem they're trying to solve and being very explicit about what each of these data points mean because there are gonna be assumptions made on both sides. And if you don't get over those assumptions, you might build something that doesn't really solve their needs or gives them inaccurate data to make decisions on. So I think, especially as an analytics engineer or an analyst or a data engineer that flows more into the business side, you really need to understand what problem are they trying to solve so that when you're going and getting this data and modeling it, it's actually solving the problem that they have, not the 1 you think they have. You also have to work with the data engineer and say, this is what I'm trying to solve. Am I looking at the right data? Do we have the right data? So I think it's a lot of really internal management of ideas and data understanding and all of those things that goes into the modeling that's actually more difficult than, like, writing the queries themselves or writing the models. Because, like, if you understand what someone is trying to get out of it and you understand what the raw data means, it's a lot easier. But a lot of the times when someone comes to you, they don't know what they don't know, and you have to validate that you guys are speaking the same language.
[00:55:27] Unknown:
For people who are interested in being able to build some of these automation capabilities for their business, what are the cases where CDP or reverse ATL or the combination of the 2 is the wrong choice? I don't think it's ever the wrong choice in my mind, but I'm biased.
[00:55:43] Unknown:
I'm heavily biased too. I think there are there are some use cases that actually do need true real time processing. Mhmm. I think some example use cases are, let's say, post purchase notifications. Like, if if I'm on, you know, let's say, jetblue.com and I buy a flight, like, I I don't even if our my snowflake syncs are every 5 minutes as a customer, it's just not fair to wait 5 minutes to get a confirmation that Yeah. A flight was actually purchased or notification of it of anything of such or for that information to register in a super important mission critical system. So then, yes, people can build mission critical pipelines on top of the data warehouse and on top of reverse detail, and the tooling's only making it easier to do so. But if those pipelines need to run extremely quickly, like, let's say, under a minute or a few minutes, then it's definitely not a fit for reverse CTL as it stands today.
That said, we are thinking about more broadly, how to help companies tap into all the data infrastructure they have to address those problems and those use cases as well. For example, building connectors to sit on top of, like, a Kafka or Kinesis queue and pipe a row from there over to an event API and something like a email marketing system so that they can immediately fire the the email off for a post purchase confirmation email. But I think real time and latency are, like, a few of the only issues that can't be solved with just changes to reverse ETL type process today, and that's something that needs to be solved upstream, the data warehouses or in the source technologies that reverse ETL platforms like Hyatt that should actually connect to. And while 90 or 95 or maybe even 99% of use cases don't need, like, super low latency, there are those use cases that do, and it'll really be beautiful when you're able to do all of that through 1 system.
[00:57:27] Unknown:
As you continue to work in this space, what are some of the other industry trends that you're keeping an eye on? And are there any particular product categories that you anticipate might be the next sort of breakout event along the same lines of, you know, CDPs and reverse
[00:57:42] Unknown:
CTO? I mean, there's 2 that come to mind. One's already kind of breaking out, but I have yet to see something that I feel makes a huge impact right now. I feel bad saying this, but, like, I'm waiting to see the evolution and who's the front runner for the data quality area. I think a lot of these things run post processing and the damage is already done in certain situations. So it's like, cool. Thanks for telling me after the fact. Now I have to rerun my jobs. Right? So I think it would be more interesting to see a more scalable, cheaper, less process heavy data quality tool, I guess. You know, a lot of these also sometimes take as much effort to get set up as they help you in the long run, so it's a little bit labor intensive to the start. So I'm interested to see how that evolves.
The only other 1 is still I'm interested to see a lot more on the streaming side. You know, I've been keeping an eye on Materialise, but as I mentioned, you're only as fast as your slowest tool. And since a lot of people in the CDP space are heavily reliant on CRMs and things like that, that's not gonna be coming in streaming. So unless you get streaming ETL coming in and all those different things, Materialise could be really cool in the long run, but every other tool needs to speed up before Materialise reaches its full potential.
[00:58:52] Unknown:
Yeah. I would say other than the 2 that Rachel mentioned, I'm also pretty excited about some of the stuff happening in the metrics layer space. I think, as a vendor, kind of think about how to make reverse ETL more and more accessible. How do we actually enable work flows where, you know, marketers and sales ops people can come to the system and say, I want these data points or I want these data points plotted over these time periods to be synced into my system, like number of shows watched in the last 7 days without a analyst having to go in and define number of shows watched last 7 days, number of shows watched in the last 30 days, number of shows watched in the last 60 days. How do we make that whole process more streamlined? And I think it's it's having a better semantic layer around the data. And previously, like, the only good place for the semantic layer has really been Looker or LookML. It's the only wide stream way, but I think there's a few initiatives. There's companies like Transform that are trying to make a more generic layer for this. There's also, you know, talks in the DBT GitHub issues and forums about DBT potentially playing in the space.
[00:59:54] Unknown:
I'm not a 100% sure what the right solution is here yet, but I know kind of a standardized metrics layer will making reverse ETL more accessible to business users a lot easier, and I'm super excited to to see what happens there and tap into it. Well, for anybody who wants to get in touch with both of you, I'll have you each add your preferred contact information to the show notes. And as a final question, I'd like to get your perspectives on what you see as being the biggest gap in the tool in our technology that's available for data management today.
[01:00:20] Unknown:
Data orchestration. It's the biggest pain point for me trying to make sure everything runs in the right order at the right time, or else you end up with stale data being sent to places that then automation downstream is negatively impacted.
[01:00:35] Unknown:
And for me, it's streaming our real time just because those are the 1 to 5% of use cases that we can't solve by building more product features in our product until there's improvements to the underlying technology. Think it will happen. I think it'll happen incrementally,
[01:00:49] Unknown:
but I can't wait till it's all resolved and we don't have to have any caveats there. Well, thank you both very much for taking the time today to join me and share your experiences working in the space of CDPs and reverse ETLs. Definitely a very interesting set of problems and growing need for a number of customers and companies. So definitely appreciate the time and energy you're putting into that space, and I hope you enjoy the rest of your day. Yes. Thank you. Thank you. For listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com to learn about the Python language, its community, and the innovative ways it is being used.
And visit the site of data engineering podcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Sponsor Messages
Guest Introductions
Understanding Customer Data Platforms (CDPs)
Challenges in Building Unified Customer Views
Role of Data Warehouses in CDPs
Reverse ETL vs. Operational Analytics
Evolution and Use Cases of Reverse ETL
Future Opportunities in Reverse ETL and CDPs
Real-World Implementations and Challenges
Key Lessons and Best Practices
Industry Trends and Future Directions
Closing Remarks