Use Consistent And Up To Date Customer Profiles To Power Your Business With Segment Unify

Hello, and welcome to the Data Engineering podcast, the show about modern data management.

Legacy CDPs charge you a premium to keep your data in a black box.

RudderStack builds your CDP on top of your data warehouse, giving you a more secure and cost effective solution.

Plus, it gives you more technical controls so you can fully unlock the power of your customer data.

Visitdataengineeringpodcast.com/rudderstack

today to take control of your customer data.

Your host is Tobias Macy. And today, I'm interviewing Kevin Naparco and Han Han Wang about Segment's new UniFi product for building and syncing comprehensive

customer profiles across your data systems. So, Kevin, can you start by introducing yourself?

Yeah. Absolutely. Tobias, thank you so much for having us on.

I actually started as Segment's 1st data analyst back in 2015, so

really helping our early team figure out our business model, inform our product direction, and go to market strategy through our own data.

But 1 of the things that I learned really quickly is that being the lone data analyst at an analytics company means that I often was the first internal customer.

Essentially this dog eating the dog food every day,

constantly querying our data, our data warehouse, consuming it in various tools. And so I think that vantage point give me a good perspective that to really be able to answer some of the toughest questions that our business was facing. It required

really high quality customer data to be able to understand our customer base, what they needed, and how we could better help them.

We'll hand it over to Han Han to share a little bit more about herself.

Hey. I'm Han Han.

So excited to be here today. So before segment, I spent 4 years at Amazon

as a PM on machine learning and AI products.

I ran the Alexa smart lighting business, helping people turn on and off their lights from the comfort of their couch.

And I also worked on a new to world, kinda controversial product called Tone,

which analyzes how you're

coming off to others based on the vocal biomarkers in your voice stream.

And at Amazon, I really saw the power that good data can bring to a business.

It's such a huge competitive advantage.

And as a PM in Amazon, I could just focus on building these cool new ML experiences. It's just always assume the underlying data was in the right place, in the in the right format.

So from there, I came to Segment, and what's super motivating to me is that we're democratizing customer data access and these insights so that companies of any size, not just these big tech behemoths, can

make these truly data driven decisions that they need to compete.

And so going back to you, Kevin, do you remember how you first got started working in data?

Yeah, absolutely. So I had done some,

a startup

earlier,

and that was sort of the first foray. It was a mobile social network,

helping bring together friends around common shared interests. And so

there was a lot of customer data that we could have had at our fingertips. We ultimately failed to find product market fit and, you know, I reflect a lot on that experience

as

really not following the data that we had at our fingertips and giving us signal as to whether we were on the right path. And so it was incredible learning from that perspective of how do you

really listen to your customers, both qualitatively

and quantitatively,

to be able to drive product decisions that are going to lead to a better product at the end of the day. And, Han Han, do you remember how you got started in data?

I think data is just very much in the fabric of being a PM in Amazon. Every week. Right? You have to keep a pulse of what's going on.

Every week. Right? You have to keep a pulse of what's going on.

When we launch products, like, everything is measured.

Everything is mettricked.

The dev teams focus a lot on it. It's part of the operational checklist. You know, it's part of the decision making.

And and so it's

I don't know if there's a start. It's just kind of the environment

of being at a place that is very data driven and and metrics oriented.

And so in terms of the Segment Unify project, which is what we're talking about today, before we get too far into that, can you just give a bit of an overview for folks who aren't familiar

what Segment is,

and maybe a little bit about kind of its role in the overall data ecosystem of a given organization?

So, yeah, there's this really long and exciting history around customer data platforms that I'm sure we'll get into later in this conversation. But last month, we launched Segment Unify, which is providing consumer grade identity resolution. And we think this is the next big breakthrough

around

CDPs.

An easy way to think about what a CDP is and where it fits into an organization

is if you were running a business before the Internet, you'd get to know all of your customers personally, right? They'd come into your store, you'd develop this relationship with them. They'd tell you about their lives, the things that they were into, their favorite style,

whether something seemed too cheap or too expensive, whether they like the pink pants or the green overalls. As businesses have moved online, that same interaction continues to happen, but now it's happening in the digital space. It's happening thousands of times per second

across mobile apps and websites and CRMs and marketing automation and contact

center systems on and on. Right? There are so many different tools which are informing that customer's journey. And on the other side, there's this this growing ecosystem

of ways in which you can put that data to work from advertising and marketing, to texting, on-site personalization,

even now fine tuning AI large language models. And so the uses around customer data are large and continuing to grow. And 1 of the things that we've realized is that

collecting this data for our customers, it's this hard engineering challenge and it requires a lot of advanced infrastructure to get it right. It's not like this really exciting infrastructure that engineers show up excited to build. It's a lot of boring

stuff. A lot of the plumbing behind the scenes

to get everything to work well together. And so that's really where CDP sits within an organization,

a set of APIs and infrastructure

to get data from wherever it's generated

to wherever it needs to

go. And so in terms of the UniFi product, can you describe what that is and some of the story behind how it came to be and the role that it plays in the overall Segment product suite?

Absolutely. So 1 of the things that we've learned along the way from our customers is it's not just enough to get raw data from 1 place to another. There are a lot of hard problems to make this raw dataable usable

and

actionable. So on top of that raw data with Unifi, we're providing identity resolution capabilities to turn the raw data into golden profiles.

Golden profiles are essentially the most up to date trusted digital record of who your users are and where they are in their journey with your business.

It's not necessarily a unique concept in the data world, but there are a few things that we're doing differently with Segment Unify.

The first is that these golden profiles are complete, which means it's the collective understanding

around

who your users are across different touch

points. We've extended this to include data from the data warehouse with reverse ETL. So the insights that data science teams and data engineering teams are generating can now easily hydrate that profile.

The second piece that's unique here is that these profiles are portable,

meaning that they can sync across 100 of different tools that can be powered by the centralized understanding. And then the third is that they're real time and always up to date, meaning that they represent the latest state of a user and where they are in their journey.

The stat continues to blow my mind. We're resolving 250,000

data points to profiles every second and executing over 50,000

incredibly complex profile attribute computations

in seconds of receiving new digital signals. So that gives you a flavor of the size and scale and real time performance of the system that we were building.

And as far as the

kind of golden profile

aspect, as you mentioned,

unify is serving the purpose of bringing together these different data sources and combining them into this 1 cohesive view of the customer through their profile?

And in terms of that profile object, what are some of the categories of attributes that need to be managed and maybe some of the different sources that those attributes and or categories of attributes might come from in the given organization?

Yeah. A simple way to think about this is

who is the user and

how are they interacting with your business? So who is your user? Think about these as traits and identifiers.

Traits can break down into a few different categories. You can have raw traits like the user's name or what billing plan they're on.

There's computed traits, so things like total lifetime orders or average order size.

There's also

predictive traits.

These are inferences

about what a user is likely to do next. Things like likelihood to purchase or churn

how a user ID and an email

and

a device ID all linked together in this

representation

of who a user is across touch points. And then there are events. So this is how is this user interacting with your business.

So think about a common e commerce funnel, somebody is viewing a product, they've added it to the cart and then they've ultimately checked out. All of these are events and digital interactions that are happening with your customers. There's also this need to append

additional context to the profile. So things like what audiences and promotions does a user qualify for

or where they fall in various marketing journeys that are being executed by marketing teams. I think there's this big moment that many data teams have as they mature with their customer data infrastructure, which

is golden profiles aren't the static users table. It's really this dynamic and strategic asset

that requires significant investment to get right.

And in terms of that investment and the evolution of that profile object, what are some of the kind of technical and organizational challenges that come about in terms of

understanding

what are the actual semantic definitions of some of these attributes to determine how they're computed or aggregated,

what are some of the ways that mutation of that profile object can have downstream ramifications on the overall data suite of the plat of the organization

to some of those kind of,

complexities that come about in in the overall life cycle of a given customer profile and the way that it is defined in the bounds of a business?

It's a great question. And I think there are sort of 2 hard problems that we sought to solve with Segment Unify.

So the first was around this concept of identity resolution, which is how do you know who your users are across touch points? We can walk through an example here to give you a flavor of why this is such a hard problem. Let's say a customer purchases a product in store and they provide their email address at checkout.

Later they call or text in to support to get help installing this product.

So you have their phone number. And then once the product is installed, they sign up for the service and ultimately get a user ID. Now imagine the same interaction is happening in different sequences with different identifiers,

happening potentially 100 or 1000 of times per second.

This is the hard problem of resolving

identity at scale in real time.

And there are particularly hard parts that we've uncovered in this journey. So things like anonymous to known, how do you take somebody who's top of funnel exploring

on the website

down to downstream

purchase behavior,

as well as shared identifier detection.

So identifiers can have varying levels of uniqueness.

Phone numbers can be shared among a family, devices can be shared among many individuals, right? Each identifier has a different level of uniqueness that needs to be baked into your identity resolution logic. So that's sort of big bucket number 1 around some of the challenges that businesses have

in implementing identity resolution strategies.

The second hard problem, which is more thematic across the industry and very top of mind for data engineers and data architects,

is this concept of real time streaming architecture and how it relates to data at rest sitting in a data warehouse.

You hear a lot about Customer 360

and Single Source of Truth or this 1 data store that's going to rule them all. And I think for most businesses, what we've realized is that this is ultimately a myth.

The reality of customer data infrastructure is it's this tapestry of databases and Kafka streams and data lakes and

SaaS tools and data silos, each which hold the subset of who your users are

and how you can better serve them. And so there's this common challenge that businesses face as they mature on their data infrastructure, which is bringing real time data streams

alongside

data at rest, sitting in a data warehouse. And so that's where our new reverse ETL capabilities come in that we've introduced in conjunction with Segment Unify, which is the ability to query data that's sitting in the data warehouse

and bring it onto the golden profile

without having to spin up a ton of advanced ETL and data infrastructure.

So our, systems are operating at real time

around

2,500,000

events per second, but we're also providing data engineers and data scientists the ability to tap into the rich data that sits in their data warehouse

and join it onto their golden profile that can be used across the stack.

Yeah. I think when we started pitching this, we called it

a each destination was a data island.

So every destination that you send your profile to in order to activate,

your customer information, like Amplitude, Breeze, Iterable, ask you to send kind of this full fire hose of raw data and then infer a profile.

And each of these,

like, down engagement apps also need, like, a specific set of fields

and IDs,

in order to,

drive the right ROI from their personalization

features.

So I think to the statuary, like, customers

wanted await us. They're they're looking for ways to centralize that. They're looking for ways for their profiles

to really be the full view and then to pick out parts of the profile

and easily

send exactly the fields that they need in order to drive the highest ROI,

in their engagement

applications downstream.

And in terms of the return on investment,

what are some of the ways that the this more comprehensive and cohesive

profile object

can impact the lifetime value of a given customer and some of the ways that you're able to think about measuring the investment or the impact of this more rich profile

on the bottom line of the business?

Yeah. Absolutely. So I think 1 of the biggest challenges that businesses

face is

there's data everywhere, but it is often unusable, not sitting in a usable format,

and isn't really tapping into its full potential.

And so Unify is really about getting all of this data from where it resides

and bringing it together in 1 place.

So 1 of the customers that's been relying heavily on Unifi is CrossFit,

very active community

of fitness folks. I think we've all had this experience of somebody who's gotten really into CrossFit and can't stop raving about it. They had this huge amount of data and insights from their engaged fitness community, but it was siloed and disconnected. And so they described Segment and Unify as giving them these data superpowers.

So it gives their team this complete user profile and better personalized experiences. They're using it for

virtual contests,

providing local gym recommendations. So converting folks from top of funnel into

joining their community

and then generally helping their customers get more fit through data, which I think is a really exciting prospect.

MongoDB is another 1. So modern

B2B

platform,

They're using profile sync to get this better understanding

of a really complex

B2B

journey

from multiple stakeholders exploring the product

down to a complex implementation

cycle. And so they use Segment Unify and ProfileSync

as the foundation for their data strategy. They're joining in

181

additional tables in their data warehouse to really complete the profile

and provide all of the dimensions

of their accounts and user profiles. So

really gives you a sense for the type of scale and,

data coverage that's required to really understand

these complex sales cycles and be able to deliver the right message at the right time for an account.

And as far as the actual

day to day business

aspects of how different members of the team, whether in data or business operations or marketing or sales, etcetera,

what is the impact on their lives of having this customer profile and some of the ways that they are able to kind of interact with it, both in terms of pulling data into their systems and being able to simplify

the

availability of information for their own needs, but also being able to provide feedback or requests on ways to evolve the profile, additional attributes to include, corrections on, you know, attributes that are incorrect, things like that? I think,

you know, 1 of the huge wins is just being able to bring in a lot of different sources and build a complete view. So

1 of our retail customers

is using reverse ETL to send

and arch data from their warehouse marketing destinations.

But this retailer, like, they're they've instrumented

Segment for their website.

You know, they can track ecommerce traffic pretty easily with Segment. They have been for a while.

But they also now, with reverse detail and profile sync, they wanna add offline

retail traffic.

So things like you go into the store and you buy a product and you check out, and then really marry it to this complete view of the customers. This is a super hard problem before because

often these these things rely in different they they come in different data stores. Offline processing is usually stored in a completely different upstream system. And there's no common link between

the customer records you have there

and the ones that,

you you might be tracking through Segment. So with reverse CTL,

our,

our customers are able to, you know, bring in

that offline traffic,

use

profile sync,

tables, and join it against the identifiers we're using in segment, and then actually

send in that

offline traffic,

into Segment, where now they have this, like, complete view of the customer. They can customize email marketing campaigns and and send emails to folks based on projects they purchase both online and in the store.

So I think this this has been a huge net add to our

our data teams where it makes it really easy to bring,

these 2 very disparate sets together and then also to marketers where now their targeting is better because they know,

what folks are doing

on their website,

in the store,

really speaks to me. I think I do a lot of online shopping

and would love to have, you know, a more personalized view across the board.

And digging now into the implementation

of the Unify product, I'm wondering if you can just start by giving an overview about the architecture

and some of the adjustments that you had to make to the existing

segment kind of technical platform to be able to integrate and provide this Unify feature?

I think the there are a couple of really interesting,

technical challenges the team went through. So for profile sync,

really,

it was around,

you know, driving these real time profiles and then how do we make these,

you know, managing profiles at scale

for customers. So our managed customers can easily have over a 100,000,000

profiles. Each of these profiles,

in addition to having all of the trades, also have the full history of events

of,

every single customer going through,

and all we're we're tracking all that. So identity resolutions

system needs to be able to keep all these profile records up to date in real time. Whenever a new customer comes in, they need to be able to match it, do the merge on the identity graph, and then, you know,

do this in within

seconds

of of a person hitting a button on a website.

And then profile sync,

really, 1 of the big technical challenges, like, how do we make this work out of the box?

You know, our customers were we had a

very alpha product where we were syncing

profile

tables just kind of as is,

to customers.

And a lot of the feedback around was like, hey. It's really hard to query these tables. It's very slow to work with

my profiles if I have, like, a 100,000,000

profiles.

So

so a lot of the innovation around profile sync is like, okay. Great. Well, how do we make these queries work really easily for folks? So it's it's really performant

across your entire swath.

So we paired up with our best data engineer at Segment

to design products for our customers'

data engineers

and thought very deeply on, like, what is the ideal

table structure?

What are the ways that you know, should we materialize these tables in segments? Should we have customers do the materialization?

How do we optimize

both our materialization

strategy

as well as the tables themselves for fast performing queries across

these giant

amounts of profile data?

And then how do we make that really easy for for your data engineers?

So we provide scripts for materialization

with DBT,

and we'll provide some scripts for materialization with other tools too.

And as far as the implementation,

what are some of the technical issues that you had to address while developing and launching this product?

I think 1 another maybe interesting story here is around,

on the reverse ETL side. We really, it was important for us to protect our customers' data privacy.

And

so we thought a lot about, okay. Great. What what is the best design for our customers that preserves

privacy and is really performant

and efficient too.

So, you know, 1 of the technical considerations was like, hey. Should we should we copy query results

into an s 3 bucket and then do the diffing kind of within that bucket,

which we thought was maybe slow or efficient or maybe expensive?

Or is there a way for us to innovate here

and do some in warehouse,

incremental diffing?

Basically, takes a check some operation on the customer's data model in their warehouse

and figures out, hey. These are the changes that we then need to sync downstream to Segment.

So on the technical side, we really wanted to optimize on a less compute intensive, more space efficient approach here.

And I think the other win is that we don't ingest data

unnecessarily,

which is a big 1, I think, for our customers' data privacy side too.

And as far as the adoption path for somebody,

what are the steps involved in being able to onboard onto the Segment Unified product? Do they already need to be a Segment customer to take advantage of it? And within the overall Segment product suite, what are some of the hard dependencies to be able to implement Segment Unify within an organization?

Yeah. If you are already a

Segment connections

or profiles customer, this is super

easy. Right? So if you're already using Segment connections,

reverse ETL is embedded into

the connections part of the Segment app. So you can just find it, set up a source and destination,

try it out. You can probably get started in about 15 minutes

to send Robo CLI out to your 1st destination.

If you are using

Segment for identity resolution

already,

then setting up profile sync is also fairly simple.

Go in the Segment app. We also have an API

to, let you do this programmatically,

put in your warehouse credentials,

and

we'll start syncing

profile sync data hourly,

to your warehouse.

And then from there,

there's a, like, once you have the tables,

we do a backfill

of your historical records. We make sure you have your complete,

all the events and trades

over time.

Usually, that takes,

like,

days, maybe a couple weeks, I think, is our official SLA.

And

I don't know if I was supposed to share that. It's okay. And then from there, you'll have all these tables now landed in your warehouse.

We also offer offer a,

DBT,

materialization script.

So if you have DBT, you can could just run the script.

It'll start materializing

profile traits tables,

and and complete, you know, customer records. So we'll create a new table

of a materialized view of your customer in your warehouse,

after running that. And then and then you have the data, and you can start, you know, enriching it, joining

it, playing around with

it directly in with your warehouse tools.

And talking through the

of reverse ETL

aspects of it, what are some of the what is the workflow for being able to move from implementing Unify, building out these profile objects to then propagating those objects into some of the I guess, I don't know whether to call it downstream or upstream tool since it's a bit cyclical, but some of the other systems that your that the business uses to be able to track these different customers through their life cycle.

I think that's a really good way of describing it as cyclical.

Right? It's not 1 direction.

Data is now flowing bidirectionally

from data warehouses

into CDP, CDP into data warehouses,

from data sources and destinations

back into CDP and vice versa. And so it really does create this virtuous cycle of

data building on itself, these golden profiles continuing to get richer and richer. I think your question is a really good 1, which is how do you think about

profile sync across the stack?

So how do you think about 1 digital representation

that needs to now be ported into potentially

tens, hundreds of different APIs, different tools with different data models. Each 1 has been designed independent of 1 another.

And I think this is really the foundation of,

what we've built with CDP is the ability to really deeply understand what is the essence of how this tool defines a customer,

and then how do you appropriately

map data from this golden record into that tool. And just to give you a sense for the ways in which we can do this now, we can sync a Golden

profile via an audience in batch.

We can sync a profile via a patch change stream, so constant updates to the profile. We can sync a profile as an event.

We can also run a little bit of arbitrary JavaScript

that our customers can input with functions. And so there's really this growing set of ways in which golden profiles can be synced across the stack. It is a hard problem and that's very much why we've taken this approach of smart defaults, but extensibility

baked in as our core philosophy.

We wanna be able to provide the best understanding of how golden profiles should map across your stack, but to the extent that you wanna customize or extend that we give you fine grain controls to do so.

And then

for the

identity resolution

element of this problem, where you do have information coming from multiple sources, you're trying to aggregate it into this golden profile.

What are some of the challenges that you face in being able to accurately merge together different data sources into this 1 entity

as well as some of the toggles that are available for businesses that want to manage the level of confidence that is required before performing that merge operation or being able to include a human in the loop in terms of reviewing, we want to merge these 2 things together. Does this look right? And being able to feed that back into the ongoing operation of the unified platform. Yeah. Absolutely. And I think this is 1 of the fundamental insights that we've had is that,

every customer, every business really has their own unique identity graph that is relevant for their business. It's dependent on how they've implemented

tens of different systems from their CRM to their tracking code in their implement, in their,

on their website or in their mobile apps. And so there really is,

the need for 1 configurability

and flexibility,

but also the ability to

understand how these different identifiers are coming together to develop these golden profiles. And so, we provide a set of configurations

where customers can define

their identity resolution logic.

This has been trained on 1,000,000,000 and billions of events. And so we are able to detect

when,

issues may be arising, surface those up to customers for review, and provide

observability

as part of our platform to really give our customers confidence and understanding into

how their identity graph is working, how it's evolving,

and ways in which they may want to adapt or adjust their identity resolution logic

given the signals that we're seeing.

I think the inter the other interesting use case we've seen for profile

observability

into

into the way that, Segment identity resolution process works,

which we love because it's boosting trust in,

you know, try it's boosting customer trust in our system. It's helping them with the understanding

of how to join the data together.

And it's so we're no longer just this, like, black box

where,

identity resolution is happening. You get some outputs out.

It it's really bringing customers a lot more observability into what's happening in that box and giving them opportunities now to even change things,

change the identity resolution, change their rules that they have in segment,

and do adjustments even downstream too. And for the kind of entity resolution

aspect of it, it's 1 of the perennial problems in computer science, but in data in particular is

kind of understanding

what are those kind of combinations that are valid. But another interesting angle to this, particularly

in in analytical context, is the semantic elements

of what to merge and how and whether there are additional computations or derivations or enrichments that need to happen in that merge

path. And I'm curious how you've addressed that in the unified product to be able to say, okay.

These are the same entities

and these are the attributes. But in the actual representation, we want this attribute to be rendered differently, whether it's, you know, formatting the address or understanding which address is more up to date and accurate or particularly for things like purchases.

You know, what does it mean for a purchase to actually be completed? Like, do you have to have it in a holding, you know, a holding stage for a little while to determine whether or not they issue a return, etcetera,

and some of those business rules around the entity resolution and attribute merging process?

Yeah. Absolutely. It's a great question. And I think 1 of the hard challenges that many businesses face as they try to build their own identity resolution logic and profile systems. I think 1 of the things that we benefit from here just in terms of the overall approach that we've taken is

1 is a very clearly defined spec

around data inputs.

And so we really do have a set of well defined

scope for who our user is and what they are doing in relation to your business. This includes

those

specific raw data, raw trait fields, Things like address and phone number which can be

structured

appropriately.

The other thing which is I think

really hits at the heart of your question is defining

a semantic layer in abstraction is very hard but doing it in relation to a specific use case and a specific tool

provides clarity. And so by

connecting golden profiles with a particular tool,

we are then able

to infer and define the profile

as required for that specific use case. And so it really is about both the inputs,

the resolution logic itself,

as well as the end in terminal use case for the profile

that allows us to understand what is the right representation

of this profile, how should that be mapped into that end tool, and how is it ultimately going to unlock business value that a marketer or a support agent needs to really be able to deliver on that customer experience that they're looking to deliver on. And then as far as the Symantec attributes and the business rules

to kind of compute and derive them, as you are pushing these profiles

back out into the other systems, so things like HubSpot, Mailchimp, what have you, Salesforce.

What are some of the challenges that you have to address in terms of understanding

kind of the

which attributes to overwrite versus which attributes

to append to, etcetera.

And, also,

because not every platform is going to have all of the same fields, how you address some of the challenges of regression to the mean where everything has to just have a baseline set of attributes. And if you want to get more sophisticated, then you start to get diminishing returns because, oh, well, I created this new attribute on this user, but I can't push that into HubSpot. I can only push it into Salesforce and things like that. Yeah. Absolutely. And this is a really hard challenge that

businesses face. So there are over 10,000

tools in the most recent Martech landscape. Right? That's 10,000 different

APIs,

data models,

tools which have been defined and designed independent

of 1 another.

I think there are sort of 2 benefits and tailwinds here that we get to benefit from. The first is that given our size and scale, many of these tools and APIs and data models have looked to us and our spec

for inspiration.

So we actually provide

our product as infrastructure

for many of these MarTech tools

And they leverage us and look to us to take some of the load off their customers for

event routing and now even reverse ETL capability. So

we get to benefit from this because there's this growing catalog of tools that are

essentially leveraging

our data and our spec. And they get to benefit from it because they don't have to recreate the wheel,

define all of this data infrastructure, have customers go through the implementation cycle

specifically for their tool. So that's number 1 around some of the ecosystem dynamics that I think really help reduce the complexity of this problem. And then the second is really being focused on extensibility and portability

of profiles. So I think this is unique relative to some of the other data tools or suites, which take this sort of data hoarding approach, the walled garden approach, trying to keep things largely within their ecosystem and playing nicely, but not really well when you try to use

a tool outside of that ecosystem. So we provide a ton of flexibility

into how these mapping layers occur between this golden profile and these tools. We have, this set of capabilities called destination actions,

which is essentially a layer of configuration

which allows

a non technical person, low code, no code to actually get in there, deeply understand how data is being mapped in from the golden profile into

a particular tool

and to be able to configure that exactly as they expect. And so that level of transparency

and observability

and configuration

is very much at the heart of our philosophy, which is it's not going to be a 1 size fits all solution.

Oftentimes these tools are being implemented

specifically

for

a team and a use case. And so we wanna provide the ability to adapt this golden profile, the centralized understanding of who a user is

across an organization

and apply it to a specific tool and a specific use case to really be able to unlock that business value.

Another aspect of

the challenge around customer profiles and particularly managing it across different

tenants of the data ecosystem

is the modeling aspect of it where, you know, particularly when you're talking about things like master data management, golden records, there are things like dimensional modeling to be considered. You know, how do I break this down into the different sets of tables, or do I just have 1 wide table with all of the attributes? What are the different data types that I should use? Do I want an array field, the JSON field? Should it all just be, you know, basic data types?

And I'm curious how you thought about those data modeling challenges,

particularly as you want to support some of the historical attributes of a customer as they, you know, engage with the business over time where maybe their address changes, but you still wanna be able to see what their address was because you shipped a product to them at their old address and things like that? I think on reverse CTL, like, we have we've gotten a lot of customer feedback that, you know, folks are doing this data modeling, this life cycle development in their warehouse.

So they really want better support for what their

what kind of data modeling their warehouse offers, So things like objects and array support,

mapping JSON fields,

arbitrary JSON fields that have, you know, maybe a full

set of locations

listed for a customer,

or addresses where they've, like, previously lived. So that side. And

then ways to better hook into

that data modeling,

development life cycle. So,

supporting test and prod environments

for mapping

data downstream,

version control,

you know, testing out the different models and what,

what gets synced.

So, you know, I think the reality is, like,

folks are not going to be doing a lot of this data modeling within Segment. They are they wanna manage this in their warehouse.

A lot of them are using

airflow for orchestration,

DBT for transformation already,

And our tools need to

flexibly handle all the data models that the warehouse has and then be able to, like, in very easily integrate with the ways that customers are using those tools already. That's, like, the the best, smoothest customer experience for our data engineers.

And now that the unified product is out, you have made it generally available, people are using it, what are some of the most interesting or innovative or unexpected ways that you're seeing it applied?

I think 1 of the

1 of our cost beta customers was a, midsize

ecommerce retailer,

and

they got their hands on profile sync. And within, like, a day or 2, they had

learned DBT, downloaded it, gotten it running on their site. They build an attribution

analysis, and they were starting to play with a user recommendation

engine.

So they built this use case

over probably days

of if a customer hasn't purchased items in a year, they can now identify who those customers

are. They never had that visibility before prior to profile sync

And then recommend maybe, like, these 5 top items,

for those customers to come back into their their store or ecommerce store to buy.

And this was a small data team of

a single data engineer,

and they were able to unlock this use case within

days.

And I just really love this story because Unify is all about empowering

small data teams,

giving them the tools that they need to go

toe to toe with these big tech employers who have hundreds of data engineers and making their lives and their jobs a lot

easier. And in the process of building this Unify

feature and product, what are some of the most interesting or unexpected or challenging lessons that you've each learned in the process?

I have a couple insights.

These are

maybe a little bit more hot takes here of talking to our customers lately.

So

as tech budgets are tightening and our customers are focusing a lot more on efficiency,

predicting we're gonna see a consolidation

of modern data stack tools. I mean, I think it that might be inevitable. There's literally thousands

of tools out there

to manage your modern data stack, and VCs have pumped in a lot of funding

during the hot tech days to, to really get these tools,

to a really,

like,

exciting and

large ecosystem of data tools. But I think

as folks, you know, focus on efficiency,

they're gonna start really looking like, do we really need,

you know, 2 tools that are kind of overlapping doing the same thing?

We're already kinda seeing that. You know, we saw that recently. 1 of our unified deals, like, a customer didn't have a marketing team.

They love segment identity resolution,

but they're like, hey. You know, I wanna use

Unify and just do identity resolution in segment. I don't have a marketing team, so I don't really wanna pay for the the, marketing products downstream. I just wanna pay for Unifi, and I'm gonna take that extra money

and,

you know, boost up the amount of, our connections event volume and

and,

really,

you know, allow us to

expand to a new geo because we can bring in customers now with that extra budget.

And so I think we're seeing a bit of that consolidation

and take and really inspecting

both the number of tools and how they're being used

currently in our customer base.

And then as part of that consolidation,

that

a lot of that, I think,

makes offering you know, we believe that reverse CTL is part of the CDP just fundamentally because the reverse CTL is about activating our customer profiles and

bringing that data downstream.

But, also,

it's also a huge win from from the customer side because they can avoid the hassle of adding another vendor just for reverse CTL. They can get it if they're using Segment connections and profiles. It's really easy to integrate and just add reverse CTL,

and try it out within Segment,

and,

and completes your accessibility activation story for the CDP.

So and then I think the other

key piece of this is that the customer experience

is gonna start mattering a lot more.

So was as we focus on efficiency,

you know, the customers don't wanna spend a ton of money, like, buying data tools

and then standing up their own governance or observability

tools

to string these tools together and make them ready for prime time. They want the tools to work. They wanna get started easily,

and they, you know, are just gonna want this, like, smooth CX in between the tools

so that they work great together.

Really well said. At the end of the day, what matters is the experiences that customers have with your product and your business. And I think we can often get caught up in, you know, data architecture and data engineering

problems, but at the end of the day, this is really servicing the business and the customer journey.

And so the fastest path to provide that great experience is often

the way in which you're gonna learn the most and you're going to be able to deliver on customer expectations. So I think that's been 1 of the things that we've really heard from our customers is time to value

is really important right now in this moment as everybody's facing challenges to their business. And so finding that fastest path is key.

And for people who are exploring the challenge and the available options for managing their customer profiles, what are the cases where Segment Unify is the wrong choice?

Well, I think if you don't have a data warehouse, you know,

you won't be able to take advantage of these tools. It might might be time to consider getting a data warehouse and adding it,

to your data stack. And then I think the second is

if you are looking for you know, we see reverse ETL

as a fundamental part of the CDP.

Our reverse ETL solutions work great with event streaming,

with our real time customer profiles.

If you're just looking for a point solution for reverse ETL to get data in and out

of different warehouse tables,

then Segment Unified probably isn't the right choice for you here. It'll get the job done to, like, move data from point a to point b, but,

I think it's

you won't get the full value without the CDP side.

And as you continue to iterate on and improve the unified product, what are some of the things you have planned for the near to medium term or any particular projects or problem areas you're excited to dig into?

Really excited about,

our

entities

project this year. So NNDs

is all around

expanding the world of profiles,

going beyond, so bringing in all the business objects,

surrounding the customer profile. So things like accounts,

your households, your subscriptions,

even your pets,

and bringing that together into a full view and then really,

allowing

customers to

easily

marry that rich

stateful representation

with the real time profiles,

that we already have in Segment and bringing it all together so that,

we could really drive these amazing dreamy personalization

campaigns that that folks are really chasing. What are you most excited about, Kevin?

That's 1 of them for sure. And then, I think there's also

obviously a lot going on in the world of AI and large language models. And so really thinking about

how do you bring

context and fine tune those with

relevant data within your business? I think that's going to be absolutely paramount. And so, you know, I think that's something that's top of mind for many businesses and customers today

and something that we're investigating.

Are there any other aspects of the Segment Unified product and the overall space of the kind of golden profiles and entity resolution and reverse ETL that it enables that we didn't discuss yet that you'd like to cover before we close out the show? I I think 1 of the things that is just continuing to blow my mind is the pace of adoption

here that we're seeing among our customer base, the excitement around these products and features. And so, you know, I really do think that there is power at the intersection of real time data and

a lot of the investments that businesses have made over the years in their data warehouse strategy. And so bringing those 2 things together feels like we've really hit a chord and

is really unlocking

a ton of potential for businesses that has otherwise been latent. Alright. Well, for anybody who wants to get in touch with either of you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get each of your perspectives on what you see as being the biggest gap in the tooling or technology that's available for data management today.

Yeah. Happy to lead off. So,

as I mentioned

AI, large language models,

obviously very top of mind for folks. I think if you played around with them 1 of the things that you realize is that the experiences can be really powerful but they are also

largely generic.

The context is lacking.

And so I think 1 of the things that's top of mind is how do you fine tune these with customer context to get the most out of these large language models and co pilots.

And so that feels like a gap. A lot, I know a lot of different folks are exploring that today,

but an unsolved problem

that is really emerging

as these

AI and large language models,

grow in popularity?

I'm excited to see where we're gonna go with the semantic layer, the modern data stack.

So the semantic layer is about defining

metrics, and it's kind of that missing link between the raw data and business meeting.

Think about it as, like, kind of the Rosetta Stone for your business metrics.

Definitions

metrics definitions to date locked up in analytics tools, spread out across

all of your engagement applications.

They're not shareable.

So it's very hard to build these, like, the stream of

cross org engagement apps and

experiences easily.

Like, you know, connecting if if you are a support,

you know, agent and you get and you're getting a lot of returns from a customer, entering them in automatically

into a special,

you know, more handhold support experience or marketing experience so that you can kinda reduce the number of returns they're sending. I might be someone that falls into that bucket. And I think so building, you know, 1 metrics layer that is both very flexible

for however,

you know, you wanna use them downstream, but has some shot shared artifacts so that your data engineers don't need to, like, recreate the same definition of a customer over and over again is really exciting. I think there's still a lot of opportunity in this space, and we're starting to see some early,

innovations with, like, DBT and transform, but I think I think still searching for,

the right approach here within the modern data stack. Alright. Well, thank you both for taking the time today to join me and share the work that you've done on the Segment Unify product. It's definitely great to see that released and available for folks who are able to simplify the process of managing their customer profiles and enriching them and bringing them everywhere that they need them. So I appreciate all of the time and energy that you've put into that, and I hope you enjoy the rest of your day. Thanks so much for having us on. Thank you. This was

so fun.

Thank you for listening.

Don't forget to check out our other shows, podcast.init,

which covers the Python language, its community, and the innovative ways it is being used, and the machine learning podcast,

which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com.

Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story.

And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links