Summary
The customer data platform is a category of services that was developed early in the evolution of the current era of cloud services for data processing. When it was difficult to wire together the event collection, data modeling, reporting, and activation it made sense to buy monolithic products that handled every stage of the customer data lifecycle. Now that the data warehouse has taken center stage a new approach of composable customer data platforms is emerging. In this episode Darren Haken is joined by Tejas Manohar to discuss how Autotrader UK is addressing their customer data needs by building on top of their existing data stack.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack
- Your host is Tobias Macey and today I'm interviewing Darren Haken and Tejas Manohar about building a composable CDP and how you can start adopting it incrementally
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what you mean by a "composable CDP"?
- What are some of the key ways that it differs from the ways that we think of a CDP today?
- What are the problems that you were focused on addressing at Autotrader that are solved by a CDP?
- One of the promises of the first generation CDP was an opinionated way to model your data so that non-technical teams could own this responsibility. What do you see as the risks/tradeoffs of moving CDP functionality into the same data stack as the rest of the organization?
- What about companies that don't have the capacity to run a full data infrastructure?
- Beyond the core technology of the data warehouse, what are the other evolutions/innovations that allow for a CDP experience to be built on top of the core data stack?
- added burden on core data teams to generate event-driven data models
- When iterating toward a CDP on top of the core investment of the infrastructure to feed and manage a data warehouse, what are the typical first steps?
- What are some of the components in the ecosystem that help to speed up the time to adoption? (e.g. pre-built dbt packages for common transformations, etc.)
- What are the most interesting, innovative, or unexpected ways that you have seen CDPs implemented?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on CDP related functionality?
- When is a CDP (composable or monolithic) the wrong choice?
- What do you have planned for the future of the CDP stack?
Contact Info
- Darren
- @DarrenHaken on Twitter
- Tejas
- @tejasmanohar on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
- Autotrader
- Hightouch
- CDP == Customer Data Platform
- Segment
- mParticle
- Salesforce
- Amplitude
- Snowplow
- Reverse ETL
- dbt
- Snowflake
- BigQuery
- Databricks
- ELT
- Fivetran
- DataHub
- Amundsen
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Rudderstack: ![Rudderstack](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/CKNV8HZ6.png) RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines. RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team. RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again. Visit [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack) to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.
Hello, and welcome to the Data Engineering podcast, the show about modern data management.
[00:00:16] Unknown:
Legacy CDPs charge you a premium to keep your data in a black box. RudderStack builds your CDP on top of your data warehouse, giving you a more secure and cost effective solution. Plus, it gives you more technical controls so you can fully unlock the power of your customer data. Visitdataengineeringpodcast.com/rudderstack today to take control of your customer data. Your host is Tobias Macy. And today, I'm interviewing Darren Hagan and Tejas Manohar about building a composable CDP and how you can start adopting it incrementally. So, Darren, can you start by introducing yourself? Sure. Thanks for having me on today. My name's Darren Hakan. I'm engineering director for a company in the UK called Auto Trader.
[00:00:56] Unknown:
For those that on on today that don't know who we are, we're the largest automotive marketplace in the UK. So we help people, predominantly cars, but any kind of vehicle that you want to buy and sell,
[00:01:08] Unknown:
we can help you with that. And, Tejas, how about yourself? Hey, everyone. Glad to be on the show. I'm Tejas, cofounder and co CEO of a company called Hightouch. We're a San Francisco based, you know, Silicon Valley startup. The overall thesis of HiTouch is basically that companies have a tremendous amount of data and insights sitting in their data warehouses, data lakes, and cloud data platforms like Snowflake. And our goal at Hightouch is to help companies do more than analytics with that data. So our goal is to actually help companies use it for operational purposes, like, you know, personalizing customer interactions that are driven off of a CRM like Salesforce or, you know, running targeted ads and platforms like Facebook and Google, or even just automating, you know, legacy processes that happen in things like an ERP system.
[00:01:56] Unknown:
And going back to you, Darren, do you remember how you first got started working in data?
[00:02:01] Unknown:
I do. So I actually started working in data about 15 years ago in a fintech, and I vowed never to work in it ever again. So about that period, it was very much focused on, like, traditional GUI based reporting and all that kind of classical approaches that didn't feel very engineering. And my background is in engineering. So I made a big vow never to go back to it, but here I am. And so I think the space has become incredibly exciting over the last, I don't know, 5, 10 years, right, where it's become more engineering focused, proliferation of AI and other multiple things. So I actually was a consultant in prior to working where I am today. And, I got really interested in how we help organizations unlock data. It became a very kind of key component.
So most of the conversation prior to that would always be about, like, how do we adopt cloud? How do we do really great things? How do we build great products? And that the conversation shifted lots of like, how do we do more analytically? How do we use ML? How do we do, how do we use our data? And so I became very interested in in that process. And so I've been passionately focused on that topic at AutoTrader. It's 1 of the kind of key themes that I try to drive forward. So, yeah, never say never, it would seem.
[00:03:09] Unknown:
And Tejas, you've been on the show before, but for people who haven't listened to those episodes, can you refresh our memory of how you get started in data?
[00:03:16] Unknown:
For sure. So I actually got started in the data industry from the vendor side. It's a little bit different than, than Darren's perspective, where he was, you know, actually on the internal data teams at these companies. I actually was started on data, more working on data products. So about now it's about over 7 years ago, I actually joined a company in San Francisco called Segment. That was my first company sort of in the Silicon Valley area. And, basically, what they were trying to do was create this thing called a customer data platform or the central place where all your data could go to so that it could be sent to different tools that a business system business team, used. And,
[00:03:57] Unknown:
that's how I got introduced to really the data space and data warehouses and and all that jazz. And I'll also add as a side note that I forgot to expand the acronym earlier of CDP, which is customer data platform as you mentioned. And so that kind of brings us into the core of what we're talking about here, which is what do you mean when we say composable CDP? How does that differ from the kind of general category of customer data platform as people have come to start discussing it? Yeah. A 100%. So,
[00:04:27] Unknown:
yeah, the idea of a customer data platform, as I mentioned, is basically a SaaS tool that allows you to send all your data to it. Usually, the way they work is they give you some sort of SDKs that you can use on your website or mobile app or server side APIs to track user events. So user events and user properties are basically, you know, the fundamental types of data that you track into CDP. Things like login, sign up, product added, cart checked out. And then once it gets into CDP, basically what the system can do and popular companies in this space are tools like Segment, which is now in by Twilio or Mparticle.
It's basically they allow you to, you know, send that data to a bunch of different systems like Salesforce or Amplitude or Google Analytics or just different, you know, hundreds of SaaS tools that any business uses. Now what I saw when I was at Segment was basically that, the data warehouse became this huge investment of companies. I mean, it's it had already been around for for decades at this point, obviously, But with cloud data platforms like Snowflake, Databricks, BigQuery, it became way easier to get data into a central place, and as an organization and much more accessible to all types of companies and teams. So effectively, what we came up with was this concept of a composable CDP, which means providing that sort of CDP functionality directly on top of the data warehouse as a source of truth instead of using a proprietary SaaS system as a source of truth because that never fully happens in our experience.
[00:05:59] Unknown:
And so, Darren, you mentioned that you have kind of gone down this path of building out a CDP in this more composable paradigm versus the monolithic approach that came to kind of dominate the category in the earlier iterations. And I'm wondering if you can just start by discussing what were some of the types of problems that you were trying to address by using this composable CDP approach that weren't necessarily readily solvable through the, I'll say, traditional with with air quotes, approach of just building out a data platform and having a BI solution, that kind of necessitated going this extra step of moving into what could be categorized as a customer data platform versus just a generic data platform?
[00:06:43] Unknown:
Yeah. Okay. So together starting point of our journey into this CDP or especially compostable CDP space was, so we started looking at how we improve the technical platform that the underlying platform or trader of how we, track users and understand user behavior. Right? And so we've been doing that for actually several years using a product called Snowplow, which is a really great product to help you make sense of your users. And so we were using that readily and similar to this ethos of, the data warehouse, all that kind of rich behavioral data ends up in our data warehouse. And we built out a whole new kind of realm of really great insight reporting, running AB tests, right, and validating different experiments and all that kind of good stuff. So we reached a much high level of quality with this user behavior data. And then we reached this point in our journey where we started to say, well, we have this rich user data.
Let's start doing more with it than just experimentation, insight, reporting, and analytics. Right? And so we started exploring different things, like, we'd like to use it for personalization. Our marketing team also approached us and started asking to use this data and activate it in different tools. Like Braze is a marketing platform that we use today. But they have millions. Right? Like, the number of social platforms always grows. It doesn't seem to shrink. So we started looking into this space, and stumbled across this kind of term, this CDP term. And so we started looking at the more classical monolithic CDPs.
And, like, as Tejas said, they they are like SaaS solutions, and they they will do everything. So they have a a ton of capabilities, really, and they bring them all together into a single platform. So 2 problems that we faced with that, well, there were a few, but 2 that stand out. So 1 is we'd already invested in the way that we track users today, and that is a key capability that a lot of these SaaS solutions want to take on. They want to be able to do that whole piece. And so in order for us to start adopting, a CDP, you essentially have to replatform so many things that you may have invested in. So, I mean, I'm I'm sure lots of organizations out there have already invested in how they understand user behavior because it's a key part, right, of building great products.
So we kinda felt like, a, we'd have to throw all of that away for a a monolithic approach. But, also, we didn't need all the capabilities initially of a monolithic CDP. We kinda want it I mean, our initial I mean, like, the way we do products development is using Agile. Right? So we wanna do iterative development, and we want to approach our technology in the same way. So when we looked at CDPs, we actually said, well, we wanna go through 1 iteration of using a CEP, which is actually, let's help our marketing team. Let's send some of our user behavior, our rich user behavior, and send that to a tool like Braze. And then it's a and then the answer is, oh, well, we have to buy a huge monolithic tool that does many capabilities in order to do that, and we have to replatform, that's that's a huge commitment and investment. So I'd like to use Agile to be iterative about it. And that's kinda why we got very interested in this composable model of, like, how do we how do we select software that helps us unlock the next iteration and and so on. And I guess that's how you end up with a a composable pathway.
[00:10:04] Unknown:
There are a number of different directions that we can go from here. I guess 1 thing I'd like to get out of the way first is in terms of the kind of monolithic or integrated CDP that categorized the kind of initial definition of this category versus this incremental approach is that it was an an all in 1 solution, so it was easy to identify what are all the capabilities that I get out of it versus with a composable 1, it's okay. Well, now I have to do a choose my own adventure situation, which sounds a lot like the modern data stack, which is also a choose your own adventure. Now I have to do all the kind of bailing wire and twine to keep it all held together. And, also, 1 of the promises of that fully integrated CDP stack is that it has all of the opinions about how your kind of customer journey should look, of how to model this data, how to represent it to these different systems so that a nontechnical user can say, oh, yes. Let me go ahead and pull that off the shelf and, you know, throw my credit card at the problem, and I don't have to worry about all the other engineering that goes into it versus with this composable approach. Now because it's choose your own adventure, now I have to say, okay. Well, I have to have a technical team on staff who can do all that data modeling, metrics definition, kind of integration path. And I'm wondering what you see as the overall kind of risks and, you know, benefits of that trade off between the fully integrated versus the composable versions of this problem space.
[00:11:24] Unknown:
So so the, yeah, the benefit side of it is is, like, when when we're in a space where things could change rapidly, we're in emerging technology space, things could be we're in a space of innovation. This modularity can be really, really viable because we can we can select the best of breed tools for different components. And I think that's kind of 1 of the interesting junctures we're at now with these CDPs. So there was a a generation 1 which is monolithic, and they built all these components and it's all in 1. So you can go to market very quickly with it, but maybe you're not getting the best of breed with every component. And say we I don't know. Like, we chat GPT is a very topical point right now. But maybe, like, there's some sort of language processing model where we can do really clever things with the data, and therefore, we we can accelerate certain things around activating our, you know, user data or something. Having this modularity means you can slot different components in. Right? And they can be best of breed, and we can accelerate things. Whereas, I think the monolithic is a really good in in a very stable environment when we understand the problem really, really well. And what we're really trying to do is just standardize it and make it as easy as
[00:12:27] Unknown:
possible. Yeah. 1 thing I'd add to that, or actually, 2 things I'd add to that is, 1, I think the monolithic approach is to anything, is actually good when the technology is, like, all adopted together. So I'll give an example here. Again, this is probably a controversial example and, you know, I mean, for any backlash here, but we're we're good partners with both Fivetran and DBT. And Fivetran has been adding, you know, like, DBT functionality to their core product. Right? So, you know, adding like a dbt cloud kind of version within Fivetran. And I think that kind of makes some level of sense. I'm not saying that, like, dbt won't be able to differentiate by building additional features and additional products on top of dbt, but it makes some level of sense because when you, you know, buy a data warehouse and you buy something like Fivetran and start getting data into it, the first thing you have to figure out is how do I transform that data. So it kind of makes sense to think of that as like a bundle in some ways, even though the technologies are separate. Whereas, when a company, especially an enterprise company like Auto Trader or Warner or some of our other, like, enterprise customers actually look to, you know, approach this problem of CDP and improve things like their personalization across email and ads, etcetera, they're usually a fairly mature company that already has a lot of technology across the board.
Like, 1 of the things Darren mentioned is a benefit of the of the modular composable approach is actually the ability to switch out the technologies. But the other advantage is honestly just to use the technologies you already have in place in your business. Like if you're already collecting data to the warehouse some way you can use that. If you already build some data models in the warehouse, you could use that, etcetera. Versus the more packaged traditional CDPs tend to to work off very rigid data models where, you know, everything has to be tracked in a certain way by the CDP, everything has to be modeled in a certain way, or the profile matching feature won't work or the activation feature won't work or the integrations won't work or the audience folder won't work. And that's actually what poses challenges with these monolithic solutions is that, you know, a company doesn't usually go on this journey together, like, at 1 in 1 stroke. It's something that happens over time. The data stack is built over time. You need to be able to create value on top of it, and that's where a solution like HighTouch comes in. It's like, you know, what's the fastest way to create value on this data? It's to start taking it from the warehouse and bringing it to other systems through this process of reverse ETL. The other thing I would mention is that there's this common misconception that a monolithic solution would mean, you know, you don't need a technical team to support it. The reality is that since monolithic solutions, like, you know, Segment or in Particle, require you to track data in their format, you know, from your website, from your app, etcetera, you actually do need an engineer to implement all that tracking. Right? You need an engineer to actually go add code to your website that says, when someone clicks the sign up button, send a sign up event to, to, you know, a segment or a particle or a package CDP.
And for, like, a, you know, Shopify store or something like that, that can be fairly straightforward, and maybe take, like, a month or something like that. But if you look at a mature enterprise, you know, like AutoTrader or a company with many, many business units, you know, that could take 6 months to a year just to get some initial use cases set up and then an ongoing process as as the company evolves. With something like Hightouch and a composable CDP approach, you do need a technical team too, but just a technical team that can write SQL and access the data you already have. So the same teams that are using tools like Tableau or building reports or doing BI at a company today versus engineers that actually go change a product, which is generally quite a strenuous process at at a larger company. And to kind of dig in on that point a little bit more about the requirements of the engineering team
[00:16:17] Unknown:
for the Shopify scenario, you're likely going to have an engineer on staff who can add though that tracking code, but who doesn't necessarily have the expertise to understand how to manage the life cycle of a data warehouse, do the data modeling, to be able to figure out, okay, this is what a metrics definition looks like. This is how I enforce it at the different stages. You know, it's a very different set of capabilities and probably a different scale of organization that are going to have those capabilities around. And so I think that that maybe is 1 of the tipping points of where you go from monolithic CDP because I don't have the data staff to be able to support all of that life cycle management, and I don't necessarily have opinions about what that data modeling looks like versus the composable approach of I already have invested in my data stack, and now I just wanna be able to start getting better customer insights out of all this event driven analytics that I maybe have been piping into, you know, the segments or the m particles of the world.
[00:17:12] Unknown:
Exactly. I mean, I think that's that's really spot on. If you are a company where it's actually easier to go reach out to the website developer or the app developer and get them to change the site, than to talk to someone like a data analyst at your company because that doesn't exist. And, of course, you know, there is real advantage in going on the monolithic CDP route. But if you're already using data a lot and using these BI data warehouse stacks, even if the data is not perfect, you probably have something there that you can work with that's much easier than starting from scratch.
[00:17:42] Unknown:
So that's kind of where the composable approach comes in. But both have some place in the world, I think. So And I guess it's a trade off. Right? Like, you you meet lots of organizations that start tracking users where they they started with an all in 1 Google analytics solution, and then they grow and then they outgrow it and they have, they have to move on to a different platform or maybe they don't. So there's definitely, there's definitely choice there. But, like, what it definitely just to kind of I think Tay just hinted at something that seems to be a really valuable aspect that's that we've seen in terms of the benefits is a lot of the tooling that we use is very much like the modern data stack. And so that's actually been quite made it more successful for us as a more of a, like, a well established engineering team to to adopt because it's like, hey. You can use SQL. You can use DBT in that data warehouse. You've used all sorts of other problems. That's that's readily inaccessible, so rather than it, you know, being in the in the SaaS tool and they have to grok the whole the whole scenario.
But, absolutely, it is trade offs. So, yeah, it it's not necessarily that monolithic's a bad. I think I tried to emphasize the the merits and value of a composable CDB because it feels like the industry is very indexed on all in 1 solutions right now. And so, like, you know, the the new the new breed or, like, the new paradigm of thinking about this is definitely around this composability and and everything. And we always see this right in software, this oscillation between this back and forth between all in 1 monolithic, and then we move to smaller systems, and then we go, that was bad and and back. But, yeah, I think it's good that there's there's choice.
[00:19:14] Unknown:
And then in terms of the kind of generational shift where the monolithic or fully integrated CDP came about was I'm I'm kind of estimating it was probably around the 10 years ago time frame, which was kind of the the inflection point of when we started to move into the current area era where we are now, where kind of cloud data warehouses were not ubiquitous. They were just beginning to come onto the scene. You know, there were data warehouses as an architectural paradigm, but generally only for large organizations because you had to buy expensive appliances or run infrastructure on-site to be able to manage it. Yeah. Exactly. Have have DBAs who are expert in Snowflake schemas or Data Vault and to where we are now with the, you know, composable modern data stack.
Wondering what you see as some of the core kind of evolutions or innovations that came in that time period from when CDPs first came out as a product category to where we are now talking about composing them out of multiple kind of bits and pieces rather than a whole cloth, that led to the capability of having that CDP like experience, but built on top of a more distributed and more choose your own adventure substrate.
[00:20:32] Unknown:
Yeah. For sure. I'll add some. I'm sure Darren has some as well, from his experience in data teams over those years. But you mentioned 1 of the ones that I think is really the biggest and has a lot of cascading impacts, which is the separation of, you know, compute and storage, which which really came about because storage became incredibly cheaper a little bit over a decade ago, which allowed, you know, these cloud providers to offer storage services for extremely cheap, like Amazon S3, and for for, you know, products like Snowflake and Google BigQuery and even things like, you know, AWS, Athena, and Databricks to some extent to be built off of this fundamental innovation. And what did that unlock? So actually, you you you said something that was spot on, like, you know, the CDPs sort of came about around the same time at these cloud data warehouses. It's interesting because I think CDP companies and martech companies and adtech companies were some of the original, like, you know, highest spenders and biggest customers of the cloud data warehouse platform. So knowing Snowflake was the cool kid on the block and, you know, getting to its first, my understanding is 10, 000, 000, 20, 000, 000 of revenue.
They were actually, you know, mostly serving MarTech and AdTech companies that were trying to find the technology that could store all their customers' data and build things like a CDP or other types of marketing technology or advertising technology platforms. So, I mean, that made a lot of sense that the 2 technologies came about at a sort of a similar time. But what we actually saw was that, you know, companies ended up using these powerful cloud data warehouses as an investment internally outside of the scope of just using SaaS tools like CDPs that obviously use use things like Safelake under the hood. And I think the biggest, you know, the the next biggest innovation I see was that came about after, you know, the cloud data platform and separation compute and storage was actually this idea of ELT, which just made it tremendously easier to run a data team, in such an expensive and arduous task and something you had to really get right when you're doing it, otherwise, you're gonna have to figure out how to scale this thing up and, go through huge nightly transformations and stuff like that. You just had to put so much thought into, like, what data was going into the warehouse, you know, right away. And so there's there's no looking back, no way to run super, you know, big transformations at scale once the data is in the warehouse. With the cloud data warehouse, that completely changed, and that made it possible for services like Fivetran that come up, which are, like, you use 200 SaaS tools in your company. Just click a button and we'll get that data into the data warehouse, which is a really powerful thing, to happen at the ecosystem.
And it also, you know, enabled things like, okay. If you wanna start using the data for a new use case, you can just transform it slightly in SQL and then start using it for a new use case in a platform like Itouch. So I think you you mentioned it spot on, which is the separation of of compute and storage and the cloud data warehouses is really what unlocked a lot of these capabilities, for businesses. Darren, anything to add there?
[00:23:42] Unknown:
Yeah. I think they're pretty pretty much what I would say. I think 1 way I would position the day the data warehouse today, which I do think is important, is it's almost become an appliance. It's not just a data warehouse to do to write SQL queries. So, like, when I said I would never leave, never work in data again, that kind of generation of of data warehousing that I worked in was very much, you know, a SQL interface to access and query data. It was never ever considered to be like APIs and an appliance that I could use to do all sorts of things. Whereas now it just feel like the modern data warehouse, the likes of BigQuery and Snowflake, it's not just really about they can they can store massive amounts of data, and it's it's it's in the cloud. They've got the appliance and they've got really rich APIs on top, and that gives the ability to create an ecosystem around it such as Hightouch and Fivetran. And so they've they've commoditized the whole space and, you know, that's that's how we've ended up with it being like that centerpiece. So it is kinda different to just it is a warehouse to solve and query for analytical purposes. It's actually built an ecosystem around it, and I think that's that's helped make a real shift in the last decade.
And then you're right. Right? The ELT approach combined with this kind of, ecosystem and this commoditization of things, it's just made the the ability to kind of get data in really, really fast and really easy.
[00:25:06] Unknown:
Yeah. I mean, the only only thing I'd really add to that is, I mean, both of our responses are somewhat from a data engineering and data team perspective. But the biggest thing I saw change at companies between, you know, the inflection point and the the creation of the CDP back in, let's say, 2014 to 2015, all the way to to, maybe 2018, 20 19 when I decided to leave Segment and actually start Hightouch was actually that, you know, when I first joined Segment, people's view of the data warehouse was completely different. They saw it as this advanced analytics tool that they would reach to when they couldn't answer a question in Google Analytics or Salesforce or Adobe Analytics or Omniture or 1 of those, like, you know, or Amplitude or 1 of those sort of specific analytics tools built for a certain role or function in organization. If you couldn't answer your question there, then you would go over to the data warehouse and the data analytics team who would do some more advanced SQL and stuff like that for you. Now when I left Segment, really what I saw was that all these technologies that came about, you know, even for companies that didn't adopt the modern data stack, quote, unquote, Fivetran, dbt, all that sort of stuff wasn't even that popular at the time. It still isn't even that popular in the enterprise as a whole. But, you know, even for companies just investing in things like Snowflake, Tableau, Looker, the data culture had really changed at those companies where, you know, data warehouse and BI reports weren't the advanced analytics tool. They were the main tool people were going to to answer questions about what's going on in their business and how their campaigns are performing and pulling user lists and all sorts of things. So if people are already using that as the main source of truth, why not keep investing in that and and get more value out of it than starting from scratch and having, you know, 2 parallel universes that you have to maintain? Yeah. That's a good point actually around the culture. There's, like, there's definitely
[00:26:59] Unknown:
a movement that I've seen around kind of non engineering disciplines, having a more, like, like, a hunger and an appetite for data in ways that 10 years ago, you wouldn't have seen, like a a marketeer, for example, or somebody in an operations role. They want to do smart things with data in a way that I've not seen in the past. Maybe it's because we're exposed more to AI and and ML and other things. I also find that people just know what cloud is more so that you could actually talk about, say, Google Cloud or AWS, and people kinda know that in other roles when they work in a in in companies.
So that's definitely driving more kind of demand, I think, on on data teams and engineering
[00:27:37] Unknown:
teams as well, that the culture is is is shifting, and and we're seeing that as well. So that's definitely something I've seen. I think too, it's also a matter of the kind of perceived activation energy for any particular data request where 10, 15 years ago, it was, oh, I wanna be able to answer about x, y, and z, and it was, oh, that's actually in 3 different systems. So you're gonna have to give me, you know, a half a $1, 000, 000 budget in 6 months before I can answer your question. Whereas now it's, oh, okay. Well, we are are either already collecting that, and I just need to do a couple more database joins or, you know, oh, I don't have data from that system yet, but give me, you know, a week and a couple of engineers, and I can get that for you. So it it it's a much kind of lower barrier to entry, and it's also a much higher visibility of what data can do within an organization and the kind of importance of being able to combine data across multiple sources so that there is that kind of appetite for even engaging in that work versus where we were 10, 15 years ago. And then digging more into the kind of c d CDP specific aspects of that data experience, I'm wondering if you can talk to some of the kind of end user experience, kind of who what are the roles that you're trying to serve by creating this kind of customer data platform as compared to a generic data platform that serves the entire organization for, you know, logistics and page traffic and, you know, whether or not I have enough stock or, you know, my HR systems and and just kind of what are the elements of that user experience that come together as you compose these CDP layers?
[00:29:11] Unknown:
So I actually think they're kind of the same types of people. So I because, like, my background and where I focus today, or trader is, is building platforms and platform capabilities. And, 1 of the kind of components of that is we build the data platform, which is assembling capabilities around data, right, to serve people. And we actually think about internal users as customers of that platform as well as our partners and our actual end users, our actual customers. And so they are customers like a marketeer and a salesperson and other things. So across all of it, for me, building tech technology capabilities that's that give them a great experience is is paramount.
I think within the realms of the the CDP space, I definitely see a huge focus from people like marketing. Well, then we've expanded and now we've got like customer service and sales. And so actually, initially I thought I, it's a more of a specific set of set of people, but I I don't actually think it is from the the data platform route. And I think that's 1 of the kind of benefits of a composable CDP actually is that we've, I think we've, we've either touched on it or completely pointed out through this conversation, but it is the same sets of tools that the engineers get to use. But it's also that people like, somebody wants to use locker to understand HR records. They also want to use locker to understand some of their user behavior. Well, there's definitely components in a composable CEP where we do still want to offer a really good user experience to see I mean, like, 1 of the things we talk about in CDPs is it's basically profiles. Right? It's like properties about Darren, for example, and who I am. And so, like, we do want to offer a good experience to that. So for us, that was actually using, HighTouch, which is where Stages is today because that gave us an experience on top of our data warehouse to kind of look at that, and it's domain specific to the CEP.
But I also do wanna stress the point of, like, it's it allows us to use, generalized modern tools and techniques where it makes sense in a composable CDP rather than it all being, specific to that technology. Tejas, I don't know if you want anything to add to that. Yeah. A 100%.
[00:31:23] Unknown:
I mean, I think 1 of the important parts of of the compostable approach is is actually not it's not just that it's, you know, more flexible that you can start using the data in the warehouse right away, that the warehouse is source of truth. It's actually that that idea of the warehouse being the source of truth and building off the warehouse does provide powerful functionality to the end business users as well. So as an example, I'll just give like an example. You know, we serve a very large pet store company in the US and they've always had this challenge where they can't query, let's say users based on the pets they have, or based on the households they're in or based on these related models that actually fit together in a customer 360. You know, customer 360 view is not just the user and some traits about them and some events they took. It's it's all these connected objects that really reflect the customer. And, you know, with the sort of CDP based approach, your operating office event stream that you're capturing from your website, just, you know, events users are taking on a website or app or stuff like that. With a composable CDP based approach like Hightouch and our customer studio, like, we can actually tap into all the different data models that exist in the data warehouse, you know, leveraging the relational nature of the data warehouse and how powerful of a data store a data store it is. So I think what's really exciting about, you know, the composable CDP approach is actually bringing the power of the data warehouse, you know, its ability to store everything everything in the business, its ability to have all these really dynamic data models, its ability to have not just data that needs to be activated, but also the ad reporting and the campaign reporting and all the information about how these campaigns are performing so you can analyze the audience performance.
And then bringing that to the actual business teams, is what's really exciting to us. So on top of, sort of, the reverse ETL technical platform, which just allows data teams to sync any data from the warehouse to 1 of these 200 plus SaaS tools or ad networks or whatever it is, We actually have also built little apps, like, it was originally called Audiences when Darren adopted it at AutoTrader, but now it's called customer studio, which is like a whole on visual app that that sits directly on top of the warehouse for marketing teams to jump in, build audiences, run AB tests, see how these audiences are performing, sync them to different tools that actually run campaigns.
[00:33:52] Unknown:
I think that's really what's really exciting, like, the ability to bring the warehouse to those business teams, where they live. Yeah. I think that's actually you've made me think about I've got a great example of that. So there's 2 things right to the interface. So sometimes and this is, I guess, 1 of the flexibilities about the the data warehouse being central to a composable CEP is it isn't just about 1 interface, which is kind of what we're all we're both skirting around, really. It's about multiple. So there are certain scenarios where, say, a marketeer in my org today would be interested in looking at data that would classically be visualized, say, you know, in a CDP.
But they're also interested in looking at, you know, like a campaign performance and marketing spend at the same time. And in that instance, they're in, Looker for us, our BI tool, and they're they're looking in a world of of their world. Right? So they're doing that. And then in other times, they're using, an interface like like Hitouch for for different jobs. So it's different roles and different jobs, and I think that's 1 of the the flexibility points you get with a composable CDP, right, is it's not a single interface. It's it's it can be multiple, and, you can fit that to your organization.
[00:34:59] Unknown:
Yeah. And I'll do a quick plug. We just released something today today. I'm sure it'll be not today by the time the podcast comes out, but, you know, we released something called audience analytics today, actually, that allows you to see some of those analytics of, you know, how people in a certain audience are performing or how that campaign that's running off a certain audience is changing behavior, like, you know, revenue or purchase frequency or things like that. Do some audience level attribution directly inside of the Hightouch platform. And, frankly, I think it'd be hard to build some stuff some features like that on top of a more packaged CDP model because you don't have all the data in it to sort of close the loop, Whereas the warehouse is the source of truth for the rest of the business, like your product analytics and your, you know, inventory and your store, you know, in store purchases. Like you actually walk into a physical store, which allows companies to, you know, really analyze things full circle in 1
[00:35:56] Unknown:
platform. Another aspect of the CDP experience that is advertised a lot is the idea of the kind of customer journey of kind of charting the path of the customer through your overall experience from, you know, first, you know, first, exposure to your business all the way through to they've bought 1 of your products and then maybe they come back. And I'm wondering kind of what you see as the level of importance of that as a feature of the CDP and maybe some of the ways that you are working to kind of recreate that experience in, you know, 1 of the layers of that kind of composable environment built on top of the warehouse.
[00:36:34] Unknown:
Totally. I mean, what we find is that customer journey is basically just a series of different audiences, you know. Customers who who added something to their cart, but didn't check it out. Customers who did check it out. Customers who abandoned their cart but are high value, customers who abandoned their cart but are low value. This is really just a series of audiences built on top of each other. And sometimes there's some, you know, air traffic control type features above those audiences. Like, we have a feature called priority list, which means, you know, customers can only be in 1 of these audiences, at a given time, and here's sort of the waterfall. You know, they're either in this 1 or they're in the next 1, or you can sort of plot out that whole user journey in the UI. But, yeah, I mean, what we found is that since the data warehouse is the source of truth and has all the information around the business, about, you know, the customer life cycle from all parts of the business, it's the best place to build those sort of interfaces that help you find out, you know, what customers are in this very exact situation or even figure out now did a customer open the email that you sent to them, or did the customer click on a link that you sent to them, etcetera, those sort of closed loop metrics.
[00:37:46] Unknown:
Yeah. I think 1 of the important pieces when you approach a composable CDP that is definitely worth investing in is good foundational whatever you pick, good foundational technologies about how you track your users. So, like, for us, we use Snowplier for that. And it you know, it's the garbage in, garbage out problem. So whatever tool you pick or if you write your own, you do wanna have a data model or user tracking. The tracking space needs to be robust and of of high quality because then you end up with high quality data into the data warehouse. So I definitely recommend people who listen to this to to think about that space. And I think that's 1 of the interesting parts about, composable CDPs. Like, when I looked at the market of CDPs, they were never exceptional at every piece. So we'd find that, like, they'd be great at visualizing a funnel or something, but then they they how much emphasis are they putting on on data quality and data governance on the tracking space? Like, they could I've never found 1 that was master of all things.
So I would no matter what you pick, monolithic or composable, for me, 1 of the key things that's enabled us to drive forward and go really fast with is, to to think about tracking. But outside of that, in terms of the visualization product manager a product manager that's looking at, like, the checkout process, the level of granularity and detail that they want to explore the data is completely different to, I don't know, like, an executive who just wants to see, like, a high level conversion funnel or a marketeer and 1 that, you know so I think 1 of the the things that comes out of having the data warehouse there and having the the the rich data underneath it, and we can store it all now, thanks to cloud data warehousing, is you can have multiple views of a funnel.
So I've never I've not seen a solution that, you know, solves everyone's problems and everyone's needs and everyone's interfaces all in 1 solution. That sounds amazing if it exists.
[00:39:46] Unknown:
In terms of the kind of adoption path, I'm wondering for people who are kind of evaluating the possibility of composing their own CDP, what do you see as the kind of minimum baseline for, you know, you must be this high to enter this ride kind of a thing? And then from there, what are kind of the first incremental steps that they would take towards saying, okay. Now I I have this baseline. Now this is the next thing that I need to add to be able to start getting to the kind of overall vision of I have an entire CDP. I know everything that my customers are doing at all times and how they progress through these different experiences. And, you know, my business is now running like a finely oiled machine.
[00:40:28] Unknown:
Yeah. I can take a stab at this. So a couple of things. I would say the most important thing we tell our customers and and we try to take a consultative approach and really, you know, work with their marketing teams, work with their data teams, work with their chief marketing officer or chief information officer to, you know, figure out what's the right approach for the organization. And the the biggest thing I I tell them is to focus on the business use case, like not the technology. I've seen so many companies that approach the problem of CDP saying, we wanna build this grand source of truth. We need this feature and that feature and this other feature. And then when you, you know, when we get into the company, we we realize that they actually have some data sitting in a Tableau report that they need to automate to a system like Salesforce or Adobe and Braze. And while I'd love to charge them a $1, 000, 000 for that, I don't know if it deserves that sometimes.
So I think, you know, they don't always companies don't always need a grand solution to solve their problem or they don't always need this whole, like, checklist of technology features. They need to focus, especially in this economy, like, they need to focus on, you know, what's the business problem they wanna solve and what's the fastest way to do that with the least moving pieces. And I think that's why this composable approach is is really taking off, especially in these times when companies are are tight for budget and are looking for projects that can deliver value quickly and are not looking to spin up huge initiatives that need to change the whole world. And then the second thing I would mention is that a lot of people underestimate their data maturity. It's it's my perspective here.
Darren and and the team at AutoTrader have actually quite a strong data stack, you know, with tools like Fivetran and DBT and Snowplow, But a majority of our enterprise customers, you know, don't even have, 1 of those SaaS vendors. Like, you know, the only SaaS vendor they're using for the modern data stack is actually it's actually Hightouch, has their CDP and maybe they even see it as a marketing technology solution than a modern data stack solution. And the reason for that is because a lot of these large companies, as I mentioned, are already using tools like Tableau or Power BI. Like, maybe they don't see it as a data warehouse. They see it as those sorts of systems that they're using as their source of truth. Whenever there's a question that comes up on how a campaign is performing or whenever someone needs to pull a list of users that match a certain criteria, whether it's for the marketing team or for the finance team or for the support team. So what I like to tell companies, like, hey. If you're already using the data warehouse today through your BI tool for analytics, you should probably start activating that data. And yes, you might need to make it better. Yes, you might need to work on some data modeling or data quality, but you need to do that no matter what solution you go with. It's better to invest in the thing you're already using, especially since it's the most powerful technology on the block, then to start from scratch and try to build a whole new world.
[00:43:24] Unknown:
And in terms of that adoption path and particularly on that question of data modeling, data quality, I'm wondering what are some of the elements of the eco system that are available to accelerate some of that adoption path. I'm thinking in terms of, you know, prebuilt DBT packages that I can use to say, this is the data that I'm getting in. I know that I wanna transform it into this model to be able to enrich this customer profile in this way. Or you mentioned Fivetran a few times, just kind of like the data integration path. Is there a a kind of core set of, you know, data sources that are useful for being able to build up these rich customer profiles and just kind of adding, adding jet fuel to your kind of, you know, go kart for being able to go from, you know, I have this idea to now I'm ready to start putting this in front of people.
[00:44:09] Unknown:
Yeah. I mean, I think the answer to that is definitely the modern data stack has lots of components now that help get to market fast. I mean, like, 1 of the first use cases we had around composable CDP was we had we did have this rich data. Tejas is right. Like there were so many companies that they have data somewhere, unless they're a brand new company, they they have some data. And, you know, even if it's a spreadsheet or they can get it into a data warehouse, that's, that's great. Well then to go to market really for a lot of companies, it is about things like Fivetran or other tools like that to get data in quickly. And then, yeah, I'm definite we're definitely seeing this ecosystem of, like, dbt packages and other common transformations.
Like we use a, like a help desk solution. There are built in dbt packages that kind of give you a starting position straight away. And I was like, oh, that's awesome. You know, like, we're seeing that ecosystem that happen, and that's helping people really quickly. And then I think the other side of it is tools like Hitouch help you get data back out super quickly. Like, I remember our first scenario with the composable CDP is we had some of our profile data. We wanted to be agile and say, right, what's the first business case or useful thing that we could do with our data? And like I said at the beginning of this, it was around, like, let's activate some of that data and and put it into Braze. And that's actually when, Tetris and I started working together is we used Hidesearch, and we kinda got that working in a day. And that's great. And so I think the the fastest route is probably picking technologies that help validate the use case. We don't want these scenarios where it takes 6 months, I think you said right, for a whole business plan of of of tools and solutions to get there. So I do think a lot of the kind of modern data stack already helps with some of this. 1 of the gaps that we definitely saw was the reverse ETL or, like, the getting data back out part.
And that's where I think tools like high touch are helping kinda close that gap as well. And there's I'm sure there's plenty of other opportunities for Tejas and its product launches to help with all sorts of other problems anyway as well.
[00:46:08] Unknown:
And so in your experience of building out the CDP capabilities at AutoTrader, 1 of the kind of useful ways for other people to learn is kind of what were the most informative failures that you made on that journey?
[00:46:22] Unknown:
Oh, that is a good question, isn't it? Well, the first the first failure was probably when we started to look. We we lost a lot of time trying to just pick a CDP, actually, if I'm honest. Because, you know, like, it was expensive. It would have been I mean, our data volume were, like, 1 of the largest, in terms of traffic in the UK, in terms of, like, people visit or trade a hell of a lot. So it was like, okay. Well, we've got this business case. And then, you know, to act to actually validate it is a huge upfront cost. And so we we actually lost a lot of time because that felt like the only solution at the time was to was to go and buy a CEP and then work out how to integrate that with the rest of our data technology. So my our first failure was was that rather than saying it's like I we lost our ability to think about agile and iterations and validate business cases or something. We got lost in the in the in the world of of vendor management. So don't don't do that, I guess.
And then I think the other thing that we tried to do was have like a single data model for profiles. Like, cause a lot of CDPs do have 1, like they have a profile is the profile of Darren and the contagious, but we've seen it be a little bit different to that, where we've ended up with a few variants of that. So maybe there's profiles that are more tailored to, like, help with customer service and customer support, and that's kinda different to somebody that's looking at prospects and leads and and sales operations and and so on. So I think trying to embrace the fact that the data models may diverge or iterate or you may have failures in it, you know, rather than, like, we can we can create 1 global profile, and it will solve all of our business problems, also was was quite painful. I think they were the 2. We definitely saw a risk or a pain point of not wanting to deal with the reverse ETL. It's like it's super costly to just push data to places. And so before we found tools that did reverse ETL, like HighTouch, I think the other thing we we we realized we were gonna lose a lot of time over was just building all the pipes and all the connections.
And, you know, just even, like, I don't know, a marketeer wants to use a wholly brand new social platform because we think it could drive engagement with with our brand or something. And then you've you know, we have to go back and say, it's gonna take 3 months of effort or we'll put it on the backlog. Like, that that lost innovation was was a problem as well. But, yeah, the 2 the 2 biggest ones were getting lost really in the we need to pick a solution that solves all of our problems, I guess. Like, we need the CDP. And then I think it was just as a byproduct of that, we also need sync all these global profiles that can to support everybody.
But, actually, I think we've we've got more sort of messy and accepted that, and it's it's been helpful. Or more variety.
[00:49:13] Unknown:
And, Tejas, in your work of building out these CDP capabilities at high touch, what are some of the kind of dead ends or, missed opportunities or kind of misdirections
[00:49:25] Unknown:
that you've gone through? Yeah. So I think early on in Hightouch, we always had this conflict as a company. Like, you know, are we building for the data person or are we building for the, the marketer or which, like, what core persona are we sort of building for? And it was a question that came from, from everyone, you know, everyone from, like, startup incubators we participated in. For example, we're a Y Combinator company here in San Francisco to, investors when we're fundraising, to partners like other tech companies we partner with, like Snowflake and Databricks and Google Cloud or even consultants, like, let's say, a Deloitte or something like that.
They would always ask us, you know, are you guys for marketers or data people? Because that's sort of the paradigm that exists in the market today. There's marketing technology tools and data technology tools. And, again, what I said earlier, I really stand by, which is that the exciting opportunity the between technical teams and business teams. So we've just decided to embrace that we are a platform for both data people and marketing people and other operations teams across the company, whether they're helping support or finance or or other or other teams. And that's just kind of the nature of of what we're building. And 1 of the powers that we actually have certain sections of the app, like the area where you can connect to your Snowflake or the area where you can define some data models with SQL or selecting tables and views that are built for data people. And then we have other areas of the application where if you want to get marketers in the application, you can let them use the audience builder or let them click a few buttons to sync data to destinations.
And, yeah, just like serving multiple personas versus just saying, you know, we have to be a company that serves 1 user, which is sort of the fundamental rule, I think, of building a company that seems to be out there is is really important. And I think that's 1 of our big differentiators, you know, between the composable CDP and the rest of the market. I mean, it's not just composable in terms of being able to plug it to the technologies you use, but it's able to plug into the different teams and organizational processes that actually exist in a company. So you're able to leverage the data team for what they're good at, the marketing team for what they're good at, and compose that together into a strong solution for the overall organization.
So, yeah, I think just embracing that and really leaning into it versus in the early days we had an identity crisis around that, is is 1 of the biggest mistakes and then later biggest successes of of our company that we plan to just keep doubling down on as we scale. And I think that's super important. Actually, that is definitely something that
[00:52:18] Unknown:
we were trying to work through when when picking a CDP that it it couldn't just be an engineering tool, but it also couldn't just be a tool that we buy at SAS. That's like really good for a marketeer to use an engineering hate it. So I think any tools that emerge in the space that can, like, bring these communities together and help them solve problems together is is really powerful. Like, that's that's they still get tension that's witnessed across all of my career with different disciplines in engineering. You know, you don't want tools. Like, I hate them for picking this tool. I hate you so much. It's awful. There's no API engineer. That's why I'm always going to speak with engineers in that 1. But then, you know, marketers like, you've picked you've picked a really complicated engineering tool. I don't understand it. You know, it's it's all in code, and they don't get it. Like, you need you need tools to bring them together because it is a multidisciplinary problem space.
[00:53:07] Unknown:
Exactly. And I I think a lot of the the the SaaS tools in the market are taking the stance, which is, like, do this without engineers, do this without technical people, or taking a stance where, you know, technical people can do everything or something like that. Maybe that one's a little bit less common, but for marketing technology, I think that's really silly in the end of the day. And in reality, what is the data team if it's not working with the business teams? And how is the business team gonna handle all these data concerns without having, you know, some sort of analyst or or engineers or someone technical to help them out no matter what solution they they buy. Because as you said, Darren, companies already have data if they've been around for a while. And, yeah, that's something we're really leaning into and and think that there's a there's a cultural, aspect, that we can really change over the next 10, you know, 10 years with, pursuing this, like, data activation vision, in the same way that the data warehouse changed a lot of the analytics culture in the company over the last decade. And in your experiences
[00:54:09] Unknown:
of working in this space both from the side of implementing a CDP for your company at AutoTrader and working in the space of providing some of these capabilities to consumers, what are some of the most interesting or innovative or unexpected ways that you've seen people solving the problems that CDPs are aimed at?
[00:54:27] Unknown:
Totally. Totally. So I think something let's get technical for a minute because I feel like we've been talking about, you know, more data strategy for a while. I think something that's interesting that we've built in our platform, that we actually built back when we were just doing reverse ETL, which is still an awesome part of our platform and what a majority of our customers use us for. But something we we built was this idea of change data capture directly off of the data warehouse. So a lot of times we see when when a company writes a, reverse CTL scripts in house, before they buy something like Hightouch, you know, they often have written scripts in house to get a little bit of data in Salesforce or Braze or Facebook or whatever it is. And, often what those scripts look like is basically, you know, a a for loop or something like that over results from a SQL query that grabs all the data, like, you know, select my whole users table once a night, sweep through it, send it to the API of of something like Salesforce Embreeze. And there's also many optimizations that you can make from a basic script like that to a platform like HighTouch from, like, batching to retries to dead letter queues to how you handle failures and all sorts of things. But 1 of the ones I think our customers don't think of as much is this idea of running, like, a a delta directly inside of the data warehouse and figuring out what changed and only sending those changes to the downstream tools like Salesforce and Braze, etcetera. So in Hightouch, it's really cool. Like, you have an interface where you can just say, I want this column in Snowflake to be in this column in something like Salesforce or Brace. And under the hood, we'll only be sending the changes over over time. And the way that works is we actually, you know, save a copy of, you know, the the state of the the the SQL query that you're you're using in high touch, whether it's generated by audiences or whether you input it directly into into the Hightouch UI. And we say the state of that inside the warehouse and and do some joins and delta calculation in SQL to figure out, you know, what's changed since the last run and only send the changes over, which makes the syncs way faster as well as, you know, more efficient on these downstream tools' APIs.
[00:56:36] Unknown:
Yeah. I think that's 1 of the most innovative or exciting pieces around CEPs that we can we can see emerging that, clearly, Aegis has also spotted, which is this this real time component. So that, you know, they all tend to let you sync data into them real time, but the activation piece is usually a little bit different. So I think the more innovative ways I see people do it is where they start to get closer and closer to, like, real time things. Like, I don't know, providing to somebody who's, like, on a help desk, right, and saying this person's not happy or they're more likely to buy this vehicle because x, y, zed, and things like that. Like, the more things we can do around that space, just just feel like the more they're more novel for people, but they shouldn't be right. And it's because technology has been intrinsically more kind of batched daily. When I've spoken to other companies, they seem to fixate on, like, a day daily batch decision making process.
But, yeah, the more we can move into, like, the real time space and and being able to also activate in real time feels feels like a big big step forward.
[00:57:38] Unknown:
And in your own experience of working in this space, what are the most interesting or unexpected or challenging lessons that you've learned personally?
[00:57:46] Unknown:
The the, like, the ML space, the m the AI and ML space feels really challenging still today, I think. So a lot of it's been around, like, sending sending information to other parties and other things like that. But, yeah, definitely some of the other challenges that we've definitely, been picking up on. And it's probably more combined with real time, actually. So, do doing AI and ML in batch and still seem feels good. But then, I guess, like, the other novel way of thinking about something is how do we do real time relay of data, push it into some sort of prediction, and then, like, relay that back on and things like that. So I still think that this space is to answer, like, the the challenging point jumps out in your question for me. It's it's still really challenging, I think, for orgs to solve that problem about, like, combining all of this. And so and that's the case even even with, like, a monolithic CDP. In fact, I think that's even harder in that space because it's all closed off within within the SaaS solution.
At least, in my org today, like, we have access to, you know, streams of data and the data warehouse and another, like, data stores. So that's helpful. But, yeah, like, combining all these things, composing it, and then using it for the things like AI is is helpful as well. And they see it's still it's still challenging. It's it's solvable. Right? Like, engineering can can do it, but there's there's definitely some, like, primitives, I imagine, that would make it much easier to solve.
[00:59:05] Unknown:
I agree. And, actually, the stuff Snowflake and Databricks have been adding here, at BigQuery as well, around streaming inserts in BigQuery, which which they've built something similar in Snowflake now, Snowflake streams, which is, like, native CDC, so you don't have to do any of the the hackery HiTouch does in all these different data warehouses, as well as, like, AutoML, I think, from Bitcoin, where you can just use that directly in SQL. Like, those 3 concepts together are quite inspiring. I think right now, as you mentioned, it's still a lot of work to string that together, and maybe you can't quite get that real time prediction experience, that you mentioned, Darren, super easily.
But you can see the writing on the wall that, you know, the clouds have recognized this this, you know, fallacy that you have to have, like, you know, a data where like, 1 system for batch and 1 system for real time and are are starting to improve that. And there's some really exciting case studies out there too. Like, I know JetBlue has a case study with DBT where they're actually getting flight bookings and operating on them within 2 minutes of it happening in the GDS system where flight bookings are actually processed, which is pretty epic. Excited about where the space is headed. And for people who are
[01:00:17] Unknown:
trying to address some of these kind of business concerns and get visibility into their customers. What are the cases where a CDP, whether composable or monolithic, is the wrong choice, and they're better served by just going the kind of build your data stack and build your BI and be done with it? I I find it difficult to
[01:00:36] Unknown:
separate the 2 out. Right? Because sometimes you don't know what you don't know. And I think this is some of the beauty I find about composable CDPs is that I don't have to pick. Like, I could, like, I think a lot of scenarios where people I I I've people have approached me and seek advice maybe about, like, picking CDPs. And at that point, usually, when people speak to me, they've already picked a CDP or they have a couple, and they kind of want advice on, you know, we did we look at any and what works for you? And, usually, I try to discourage that until I've worked out, I guess, that they have like, do you do you have a data warehouse? And do people use the BI tool in your company and access data? And, you know, it's like I know it's like the the maturity model thing. But if somebody has wants a CDP, but they don't have all some of that stuff, sometimes I actually try and push them back to just, you know, get get better with some some of the analytical stack, the modern data stack in in the first place because it's a foundational piece of technology for lots of use cases.
So and I think that's where I would discourage people or say, don't don't do it, is sometimes you dig into things with people and they'll say, like, we want this CDP, and we wanna do it so we understand our customers or something. And he's like, well, may maybe you don't need that. You just need a re reporting platform and and some customer data or something like that. So with all of it, it's just trying to push people back in that direction as a as a starting point. I think you need these foundations before you you layer on things like a CDP. Where it it feels the the wrong way around for me to say, let's buy it at CDP. And then after that, we'll we'll kind of invest in on a in a data platform and then a in an analytical platform.
[01:02:15] Unknown:
Yeah. I think I think what Darren said is is spot on. It's a hard question to answer because in reality, I think, you know, the most important thing is what problem are you trying to solve in your business and, you know, where are you at today with solving that problem? And then you can figure out the right path from there. Too many organizations get hung up in trying to build this modern stack they see online, whether it's a modern marketing stack, which shows a a CDP and a brand new ESP and, you know, maybe something like LiveRamp for your ads and all these tools that's gonna cost you, like, $3, 000, 000 total. Or if it's a modern data stack where companies think they need every tool in the modern data stack to proceed building their data infrastructure.
In reality, you know, companies already have data. They already have tools. You need to figure out what's the problem you're trying to solve right now, what's the fastest path to do that, and and just just start there versus trying to, you know, run RFPs for the the most common toolset. And I think that is it sounds obvious when you say it this way, but it's not super obvious because there's all these, like, FOMO and, you know, fear tactics that that different vendors are putting online saying, you know, what good is bad data, or you need a a CDP to do personalization, or without the modern data stack, you can't do analytics. And that's just, just just really confusing. I think the vendors that that will win ultimately are the ones that focus on clarity and and education and, you know, telling customers very specifically how to solve the problem they have at hand and when to use their solution, when to use other solutions.
[01:03:56] Unknown:
Alright. And as you look to the future of this composable CDP ecosystem, are there any kind of new components or capabilities or tool sets or kind of educational resources that you kind of hope to see introduced or that you're planning on kind of working towards?
[01:04:17] Unknown:
Yeah. For sure. So on the data side of things, I think we're always making our platform more and more robust. So as the core cloud data platforms that we're built off of actually add features and capabilities, we're looking into how to hook into all of those things, whether it's, you know, AutoML in in something like BigQuery or whether it's Snowflake streams, which is again that change data capture feature, or whether it's the materialized views, you know, features of these platforms that now have some incremental processing for certain things like averages and different aggregates. So we're always looking at those. You know, we always get super excited when there's a new release. It sounds nerdy, but it's truth. We get really excited across the whole company when there's a new release of Snowflake or Databricks or Google BigQuery because we're like, you know, how can we bring this value to our customers, make our platform more powerful based on all the improvements coming to these platforms?
On the marketing and business side of the equation, we are just trying to, you know, take basic workflows that require asking someone for a CSV today, simply put, and to bring those all into the platform in a really streamlined way on top of the data warehouse. Because today, I mean, people just don't see the data warehouse as a tool for them unless they can write SQL, which is really crazy in my opinion. Like, it's the best data source in the company. It should be more accessible. And I don't think self-service BI and the the tools in that space, while great, I I just don't think they fulfill that that vision. Like, even for me and some of my data to pay tasks, I don't feel like Looker quite serves me, to be honest. And so, yeah, we're we're adding more capabilities. You know, we're not a purist about, oh, we just have to be syncing data. We're adding more capabilities that that make it easier for our customers to solve these challenges in a in a streamline flow. So, like, you know, you build an audience, you sync it to places.
How do you do an AB test on that audience? We don't wanna make you have to buy another 3 tools to do that. Or how do you analyze the performance of that audience and see what the impact of this audience is on those, you know, users' behavior? We want you to be able to do that directly inside of a platform like Hightouch too. I mean, I think the best of breed nature of it will allow companies to go invest in best of breed tools to do that even more. But for companies that just want a flavor of of analytics on their audiences or just a flavor of experimentation, We wanna really streamline that for them, because the the processes that a lot of companies have today are just so archaic. So that's what I'm really excited about and shipping more features that that's streamlining that process for the different business teams. Yeah. I think for me, the future
[01:06:46] Unknown:
is is a combination actually of more like the data strategy and community piece and the technology. So like currently over the last, I don't know, like we've, we've built the com auto trade. We built a commasable CDBA baby about 12, 18 months ago. And there just couldn't find anything. I mean, the fact we're even using a term compostable CDP is quite new. So I think I'd hope certainly for the future is we'll have more a community that can talk more about CDPs from the perspective of engineering. And, you know, there's, like, stories of that, like, we you asked the question around, Tobias, around that sort of novel ways. Like, the more we can get that out in industry and people talk about it and there's tech talks about, like, we did it this way, and this is how our CDP looks. And that's exciting because it will help us learn from each other. So I think that and just more data strategies emerging actually that adopt composable CEPs, you know, out visible that we can see. Like, people do this a lot with DBT, and they'll talk about, like, architectures of how they've built it. So the more more like that will be really helpful. And then I just think there are components. So, like, high touch, you're working on some things that, I see as gaps in a composable CDP, like the audience creations and maybe like smart things to do with activating them audiences and all that kind of thing.
But I suppose with the, and so there's, there's loads of tech there. I think that is exciting too, but definitely, hopefully a future for me is, is definitely the data strategy, the community, and the conversation as well. A 100%. And
[01:08:19] Unknown:
are there any other aspects of this ecosystem of the composable CDP or the work that you're doing at Auto Trader to implement it or at high touch to support it that we didn't discuss yet that you'd like to cover before we close out the show? No. I think that's it. Alright. Yeah. I think we went pretty comprehensive. Yeah. Alrighty. Well, for anybody who wants to get in touch with either of you and follow along with the work that you're doing, I'll have you each add your preferred contact information to the show notes. And as the final question, I'd like to get your perspectives on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[01:08:52] Unknown:
Oh, okay. So my answer is data catalog slash metadata tooling in a modern space. So this is an area that I'm very interested in. You know, like how do we, so like I'm very passionate about data mesh and data products, which is a very decentralized technology architecture. And with that decentralization comes an even many different different kind of capabilities. And I think the met metadata space is very, very interesting for me. And that's another example as well of why I think things like composable CDPs are really interesting. So, like, we're looking at a few companies now that exist out there, like, these open source products, like DataHub, and Amundsen and metadata tooling. Because our profile data or our user data is available, I can automatically index it in that space. But in terms of the modern data style, that's 1 1 piece that I'm I'm very interested in. Yeah. On my end, I think it's what we touched on earlier, which is just the ability to do different types of compute off a central cloud data platform. So I think it's really, really exciting what,
[01:09:50] Unknown:
obviously, Databricks has where you can, you know, use Spark and actually have access to native data and run any compute you want, but you also have a really easy to use SQL engine on top of the data warehouse for more like BI and analytical workloads. I think that needs to be taken a step further. I see, you know, the Snowflake folks adding Snowpark. I see them adding things like materialized views that can be, you know, incrementally processed. So I'm excited for, you know, these cloud data platform companies to offer multiple modes of computation directly, directly off 1 platform and 1 source of truth when it comes to data.
Even if that data is copied multiple times or something under the hood. I just want that user experience to be available for platforms like Hightouch and for all the companies that use these technology platforms.
[01:10:35] Unknown:
Alright. Well, thank you both very much for taking the time today to join me and share your experiences and perspectives on this emerging space of the composable CDP. Definitely appreciate the time and energy that you're putting into that and in sharing your experiences. So, thank you again, and I hope you enjoy the rest of your day. Thank you. Thanks for having us. Thank you for listening. Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at dataengineering podcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and
[01:11:39] Unknown:
coworkers.
Introduction and Guest Introductions
Darren's Journey into Data
Tejas's Background and Introduction to CDPs
Understanding Composable CDPs
Problems Addressed by Composable CDPs
Monolithic vs. Composable CDPs
Engineering Requirements for CDPs
Evolution of CDPs and Data Warehousing
End User Experience in CDPs
Customer Journey and Audience Segmentation
Adoption Path for Composable CDPs
Informative Failures in Building CDPs
Innovative Solutions in CDPs
Challenges and Lessons Learned
When Not to Use a CDP
Future of Composable CDPs
Closing Remarks and Contact Information