Completing The Feedback Loop Of Data Through Operational Analytics With Census

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

Have you ever woken up to a crisis because a number on a dashboard is broken and no 1 knows why? Or sent out frustrating Slack messages trying to find the right dataset? Or tried to understand what a column name means?

Our friends at Outland started out as a data team themselves and faced all this collaboration chaos.

They started building Atlan as an internal tool for themselves.

Atlan is a collaborative workspace for data driven teams, like GitHub for engineering or Figma for design teams.

By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets and code, Atlant enables teams to create a single source of truth for all of their data assets

and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker, and more.

Go to dataengineeringpodcast.com/outland

today. That's a t l a n, and sign up for a free trial.

If you're a data engineering podcast listener, you get credits worth

$3, 000 on an annual subscription.

When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode.

With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster.

With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.

Go to data engineering podcast.com/linode

today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy. And today, I'm interviewing Boris Jabez about Census and the growing category of operational analytics. So, Boris, can you start by introducing yourself? Yeah. Hi. Thanks for having me. So, yeah, I'm the CEO over at CenSys,

where we've been building

what people call a reverse ETL for last few years. We were founded in 2018.

I'm a lifelong kind of tool builder.

Before this, I ran a company

that actually helped people manage passwords

and and and kind of deal with employee identity.

And before that, I worked on Visual Studio, you know, godfather of programming tools. And do you remember how you first got involved in data management?

Yeah. So

in 2017 and 2018,

there's a couple of things that were

both kind of happening

to me or to my team and to, and that we were kind of working with people on. We are kind of always obsessed with

customer

data

in some form or another.

And

I think the earliest

kind of signal

or experience we had that caused us to to build Census was

actually,

a few years back, I sold my last company

and started working

across

Boston and San Francisco at the time,

and collaborating with, you know, sales and marketing people that were not of my team. Right? It's very standard thing that happens in

an acquisition.

And it wasn't going that well. And I did the classic thing at the time, which is to blame the culture and the people

because that's easy.

And the truth is, like, we were a product led

startup. We were very like self serve software. You know, we had an operator console and all these things sitting in San Francisco. I could train everyone on my team to be able to look stuff up.

And the truth was that, you know, these other sales and marketing folks didn't have access to that. And so they didn't know how to message our customers. They didn't know how to do, you know, useful reach out based on what they were doing. And so that's actually where the background thread for Census began, which was like, maybe the issue is not just cultural, but it's actually

a connection problem between

product and, let's call it, analytics teams

versus

sales, marketing, and support teams, and how, like, they live in different tools and maybe there's something there that's missing. And so that's like the, let's call that the, the germ of the truth of census, right? Like that's where, like, it kind of began.

And then as I started digging into this,

you know, it really just hit

a weird part of my brain that is like completely obsessed with

what I'll call certain kinds of data federation problems.

So

my first company and census are both actually solving a very similar

problem,

which is that

in the world of SaaS, in the world of, like, having now expanded into the Internet and everyone's using software as a service

for the last 10 plus years. Right? We now have just an explosion of

software as a service inside a company. I think 1 of our customers has over 300 apps that they use, right? Independent web apps

that there's all these, like, things that will occur because of that, that, that are chaotic and, actually make things worse. So 1

is, you know, security and logging in and employee identity. That was my first company. And then

customer data is just kind of replicated independently in all these products.

And so, whereas in my brain went to, you know, this necessary, better design, which is that like, we need this centralized and federated somehow because you really can't have a plethora of tools dealing with the same

nouns and schemas,

but yet having them rebuilt from scratch independently and then wired together kind of ad hoc. And so that's how Census was born, and that's why we named the product Census, right, to solve that that problem. So that was about 2017,

2018.

And so for people who aren't familiar with Census, I'm wondering if you can just give a bit more of a detail on what it is that you're building there. And you've already given some of the story behind how it came about, but maybe where it sits in this overall sort of nascent space of operational analytics versus reverse ETL.

So Sensus at its simplest is a tool that connects to, data warehouse, a cloud data warehouse,

and allows you to extract

models,

insights,

metrics that you have in your warehouse,

and sync them seamlessly into

tools that people use to actually do their work every day.

So Salesforce,

Marketo,

Zendesk, etcetera.

So this is,

you know, kind of

going beyond what people do in a BI tool where you're looking at data and you have really amazing charts and actually

pushing data directly into systems of action, whether that's a salesperson or a marketer or a support person are actually doing their work every day in their pane of glass. Right? So that's what Census does. And this is what people colloquially would like now

call reverse ETL as a tool. You can think of that as like how that works is about connecting to a warehouse and like moving data seamlessly out. Right. So that's why this term reverse ETL emerged.

I don't like to obsess about like the fact that it's kind of a misnomer since ETL has no direction,

but that's roughly the kind of, reverse ETL means. It's the tool, the process by which you get data from a warehouse

seamlessly, efficiently,

you know, incrementally

out to

destinations that, you know, the tools that people use to do their work.

Now operational analytics is like, you could think of that as, like, what that enables, what the methodology that comes out of using a tool like this, which is the idea that analytics

is actually at the center of all your operations

rather than 2 independent entities in a company. That's actually what's important here about what we do. It's less moving the bits around, actually, and much more about helping data teams

position themselves

kind of at the center of their company and specifically at the center of all these kind of functional

workflows

that are driving the business.

That's kind of what I think of as, like, my life mission or the company's life mission, which is to say,

instead of a data team being this

quarterly, like, retroactive look back at, you know, how much revenue did we make,

instead, they should be

drivers of, you

know, how do we increase

revenue or how do we improve our support or how do we better reach our new customers, right, etcetera. And data teams are the only team pretty much in the company,

pretty much, right, that have

access to all the information

that's happening, like, from inside and outside. Right? And so

they're actually uniquely suited to being the broker to enable all this. You know, generally, I find it frustrating to watch our users or prospective users

be stuck kind of just dealing with

the request of the week, request of the day. Please fix this chart. Please make me another chart. And I think they have just a lot more to offer because

they can think in terms of what are interesting KPIs. They can invent new KPIs and then push them out to the rest of the organization.

And not just pushing them through a picture that people look at in a board meeting, but actually push them directly all the way into the, you know, the pane of glass that every single person sits in in the company. Yeah. And 1 of

the recent trends, maybe within the past 5 years in business intelligence is that there's been the recognition that just seeing the numbers tell you something isn't really useful because then you have to actually take some action based on that information.

And if there's no way within that context to be able to actually take that action, then it become the the the whole sort of feedback cycle of loop becomes disconnected, and you have, you know, multiple different activities and no concrete way to tie them all together

and be able to actually complete that loop and have those feedback cycles. And I'm wondering what you're seeing in terms of the space of operational analytics

and its relation to that idea of actionable intelligence and not just business intelligence and maybe some of the ways that you're able to actually

use something like Census to push this data into these operational systems and have that feedback, that cycle into the business intelligence dashboards and the other sort of key performance indicators that the business is tracking?

Yeah. So, I mean, I don't like to hit people over the head with, you know, the kind of Gartner style feedback loop picture that you would draw here, but you're completely right. That is the goal, right, is to improve your feedback loops. And that's the goal of

arguably all business

and most human endeavor is to try to get better at it through iteration. And so how do you do that? Right? You do that by, like,

tightening the iteration time. Right? Most products and companies that perform better is because their iteration time is shorter. So that way you can get a feedback loop quicker. Right? This is why we even invented all these, like, AB testing tools for websites. It's like to get the loop happening faster. And 2 is you gotta, you know,

wire the entire thing together

such that you can actually measure an objective function that you can improve. Right.

And so,

you know, I think

the state of the art at most companies

before we came around was

work very hard to come up with KPIs, which is kinda like an objective function that are not trivial to build. Right? Because they you have to clean the data correctly. You have to define the KPI precisely. Right? And then the loop was

maybe do a project and then come together in time, like, let's say at the end of a week or at the end of a quarter, to observe the KPI and then determine, did what we do affect that or not? And and was it causal or not? And then have a kind of, like, this human debate and then decide whether we should adjust direction or not. Right.

And in census, I have seen this like firsthand. It has shifted a lot of these things because

what you're doing is something more like, let's take 1 of our classic kind of scenarios that people use census for, which is

in these kind of high scale software companies,

they have tons of free users. Right? Tons and tons of free users, like millions, if not tens of millions of free users.

And

you can't literally pick up the phone and call every single 1, right, to say, hey, are you interested in our, like, enterprise feature? You know, like you can't do that. This is simply not scalable.

And so you're gonna create

some kind of metric, right, for determining which users

might be interesting to call.

So that's a kind

of query. Right? That's a kind of KPI.

It's actually often lots of variables come into play. Right? It'll be things like how actively are they using features,

ABC, it'll be, how big are they? It'll be, how big is their company? Right. All sorts of, you know, there's things that are a 100% bespoke to our customer, right? Not anything we could

build a

priority.

And

you'll then

tell the sales team, like, hey, these are the people to call. Right? And then you'll hope that that works out. But with Senses, what people do is they'll take that

model, right, of out of our free users, who is attractive, who do we deem to be attractive,

and automatically sync that, right, to their sales team. So that literally generates

leads and work opportunities in kind of sales parlance.

And so, like, there's a list of people to start calling. Right? It's it's just the salesperson can just go. They don't have to wonder even, like, why this user is at the top of their list. It'll tell them. Right? It'll say, they're active in this way. This is the champion at that company and so on and so forth. And so they'll just start making the calls.

And because of course you're re ingesting

all that CRM data and billing data back into the warehouse.

You can track on a

cohorted basis, on a time basis,

how are we converting

the leads that we generate?

The data team can do all that and do that in an automated way. And so you can start to tweak,

like, realize that, you know what? Our win rates are actually pretty bad, but if you slice it by employee count, it's really good above this number. So just tweak the KPI to say, okay, company's greater than this size.

And now you've increased your win rate, you've reduced the busy work and so on and so forth. Right? Or vice versa, you could kind of expand the spigot if things are going really, really well. And all of that is happening with almost no human intervention with no meeting required

and no like strife between, you know, the data team and the sales team. It's just,

we can see what converts.

And there's a lot more you could do here right now. Once you have all that wired, there's actually more that people could do than they are doing. And, like, I've got tons of ideas for them to improve that even further, right? But that's the kind of feedback loop you want,

and that's just 1 out of, you know,

dozens that people are able to achieve here. So when we first went out to market,

if you go back to our old kind of materials from,

wow, almost like a year and a half, more than a year and a half ago,

it was very much about this idea that we're the last mile in the data stack

because it completes that feedback loop. Right? It's like, you have data that perfectly, you you know, ingest data from tools like Salesforce.

And what we do is take the work that the data team does, which is massive and complex and, like, pushes it back out, and now you have a loop now. Right? I just don't like pitching a loop to end users because it's not what they're thinking about in the morning.

And so digging deeper into this concept of reverse ETL and operational analytics, you mentioned how you take issue with the term reverse ETL because of the fact that there is no implied directionality to it. But, you know, historically, there has been 1 direction as you mentioned, you know, from your operational tools into your central data system so that you can then perform analysis and put it up on a pretty dashboard and job's done.

I'm wondering if you can just talk through some of your thoughts on

why

sort of the past couple of years have been the time period where operational analytics and this whole category of tooling has kind of

spontaneously sprung up from a number of different sources.

So when Census was born in 2018,

you know, we just provided this tool that connected to a warehouse and, you know, pushed data out to

Salesforce, etcetera.

And

our users kind of spontaneously

refer to it as a reverse Fivetran,

which I think is a very reasonable

metaphor because it had the ease of use of Fivetran. Right.

And it did go in the opposite direction.

So I think that's a very reasonable description.

And then as the market grew,

people started to use this, like, broader term, which is slightly less

precise, but the core idea is still there. But really, like I said, reverse detail is just like, it's the how. Right? It's just like how bits move. I think the rise of

of operational analytics, which I'd like to think we, you know, we helped catalyze, but what's gonna happen regardless

is that teams are trying to get more value out of their data. Like the early years of the let's call it the modern data stack

was 1st and foremost, just getting to the cloud. Right? Step 1,

get to the cloud, which

these transformations take time. Right? Snowflake

took a long time to become what this obvious thing that we now see today, but it was like, that's an overnight success over in 10 years. Right.

And so I think that was the majority of what people were trying to solve for. And then

getting to efficient BI is I think we're still

in the journey there, like, in terms of

having every company have really, really brilliant BI, both in terms of visualizations and the kind of organization of the data itself to be

more useful to the broader set of people in the company. But I think once you've made those investments as a data team,

you start thinking about how to have more leverage. Right? And I think we came in at a really great time for a lot of companies

where they had

small data teams who had made kind of these amazing investments in cloud data infrastructure

that gives you a lot of capabilities that are not too hard to deploy and manage. And we gave them this tool that was also very easy to manage.

That meant you didn't have to ask an engineer to to kind of wire up data for you

and gave you dramatically more leverage.

I'd like to think that

in the arsenal of data tools, we might be 1 of the ones with the highest amount of like individual leverage, right? Where you take a single

data

analyst or an analytics engineer, whatever you wanna call them today. And

you give them this tool and they can suddenly

impact

the entire sales team, the entire marketing team, the entire support team. And this is what we saw happen at our customers. Like,

the clout of our users

went dramatically up. They started being invited to meetings they were never in. They

started to kind of turn off

some of the homegrown systems that, you know, marketing teams had set up because they were wrong.

But it made sense for them to exist because the marketing team was craving this information and didn't have the data team working with them, so they just did it on their own. And so I've seen data teams that we work with just first start to add a little bit into these destination tools and then slowly but surely just take over the entire pipeline,

both making the company as a whole, like, systems cleaner because it's all going through 1, you know, central clearinghouse that is the warehouse,

and increases, like I said, the power of the data team, and that's been kind of really exciting to watch. But I'd say that's why you see

the rise of operational analytics because people have invested in great infrastructure and they wanna get more out of it. To your point of the, you know, homegrown solutions

often being wrong, I'm wondering if you can talk to some of the ways that, you know, what are the other rising sort of nascent categories of data quality tooling

is able to factor into the space of operational analytics and being able to

validate the flows that are being pushed into these operational tools and be able to perform these quality checks in some of the ways that things like the investment in lineage tracking and being able to understand, you know, what are the upstream and downstream impacts of these transformations, how that factors into the ways that census and operational analytics is being applied in the organization.

1st, let's not minimize this. The ways in which data is wrong in a company

are

myriad.

And the first thing that I was pointing out was

there are people dealing with data outside of the data team,

and their errors are,

you know, plentiful

and outside of the kind of, let's call it, control of the data team. The word data lineage doesn't exist in that side of the house. So I think

the first step I've tried to help companies achieve, right, is to say

your marketing team or your sales team

is actually

built little micro data stacks of their own. They just don't manage them well.

And they make mistakes with their data that are, that are less bugs and more straight up misunderstandings

of the data. So they aggregate data differently than you do because they don't know

what the correct aggregation is.

You know, it takes

time and expertise to understand, let's call it, dates and, like, revenue over time, and it takes an accounting team getting angry at you to build the correct revenue KPI

that you might not do as a, you know, quick and dirty solution when you're connecting, like, your, you know, Stripe to Salesforce directly or something. And so

the first thing is

by connecting,

you know, the data stack out to

the operational tools,

you first benefit from bringing the data team's expertise and correctness about metrics

out to, you know, those teams. So I think, like, let's not underestimate that, like, because all those teams don't really know how to build KPIs at all correctly,

and they'll approximate them, whereas in the data team at least semantically

is conscientious,

I'll call it, about building these KPIs.

So the first thing you'd still wanna do is, like, invest in a tool like ours to get that data out.

But then something very interesting occurs, which is what you're pointing out, which is

if

the data team is going to be in the hot seat

for providing information into these systems,

then

the impact of being wrong on a metric goes up. This is the whole beauty of leverage. It's it's a double edged sword. Right? You magnify

every KPI's impact on the company. Therefore, if it's wrong, it's gonna be much worse.

And so I think as the data stack has evolved to be more central to a company's operations,

we've seen not just tools like Census emerge to connect it to that world, but also

data quality and testing and lineage tools emerge

to help

make sure that data teams don't screw up, right, to make sure that data doesn't become wrong.

And, you know, I think 1 of my friends over at 1 of those companies makes a good joke that, like, if you've never provided that data to someone, you haven't been in business long enough. Right? You haven't been a data person long enough, which I think is very true.

And so

there's a lot to this. Right? There's, like, what kinds of errors are we trying to catch, and what are the kinds of failures that occur? And so even here, I would

differentiate between 2 categories of errors. There's errors in metrics themselves

that a lot of the

data observability tools are helping companies manage, which I think is awesome.

So it might be due to the fact that your ingestion layer has failed to bring data in. It could be that your query became wrong because it didn't factor in, like, certain nulls that started appearing in the data. Right? There's a lot of ways in which your metrics can become

incorrect. Right? And that's what these observability tools solve for. And the lineage just helps you in my mind, like, debug that. Right? But then there's a whole other category of areas that I think tools like Census actually

solve for, which may not be as obvious to people, but I think that is, like, our job to be done here,

which is

there are a lot of ways in which

your

data

quality goes down, but not because your query is wrong

or the shape of the data has changed. It's that

the wiring is not working correctly. Right? Or there's implicit wiring at a company. And so this is what we often see that people replace. Right? So, like, you have your homegrown script

that, you know, does not here's I'll give you a great example. Like, a lot of destination systems,

they're not like a warehouse. They they actually have a lot of transient

weird failure.

They can reject data for a lot of reasons.

They might

reject

you putting data into it because, you know, a name field is too long, stuff like that. Right? And they'll just reject the entire,

like, row of data, and you won't necessarily realize that. So you're starting to skew

between

your source of truth, which is the warehouse, and these destination systems.

And that is something where a great tool like ours can make a big difference because

that's what we just track end to end. Right? We can signal to you that there's skew kind of being generated between your destinations and your source of truth, which is the warehouse.

And that's kind of data quality

in the day to day in the wild. Right? Like, this is how you end up having bad data quality. It's not just that your KPIs

have tests,

whether that's DBT tests or using tools like, you know, Monte Carlo and Bigeye, etcetera.

It's because you're actually not getting every single data point to where it needs to go. And that's kind of what we wanna make sure it never happens. Right? So we'd like it to be so that if your metric is correct and you have observability tools to make sure your metrics stay correct,

that you can be assured

that everything on the constellation,

every satellite of the warehouse

has that data. And if it doesn't, we will also kind of participate

in what I'll call, like, the modern data stack kind of observability

tooling, which is to alert you for when these things are starting to get out of sync. Another interesting element of this whole sort

of paradigm of closing the loop and this question of data quality is

because of the fact that you're pushing data into these operational systems and then pulling it back out again to be able to do things, like you said, of closing the loop to see, you know, how are our,

you know, categorizations

of potential

clients

panning out in terms of how it actually performs and being able to close that iteration cycle. You also have the data quality in the ingestion layer where you're pulling data from those operational systems into your warehouse.

How are you able to maybe link up those, you know, in and out patterns of being able to actually reuse some of these data quality checks to ensure that the data you're pushing back into these systems

adheres to the checks that you're trying to enforce on its way in.

Totally. Totally. I think your mind is exactly in the right place. I think

people talk about data lineage

for good reason. Most people think about it very zoomed in on inside the warehouse. Right? Like, you have an airflow task that does this, and you have a DBT run that generates some more models, etcetera.

And so you you have a lineage of how this this occurs.

But I think census is actually really, really well suited to what I'll call like end to end lineage, which is what you're talking about, which is that a column changes.

And

that gets ingested by Fivetran.

And it ends up through a series of transformations

feeding a field all the way back in your CRM. Right? And we sit in such a place that, like, we can see

the data arrive in the warehouse

and watch it as part of a a larger kind of transformation

workflow and then push it back into the destination.

So

that's actually where it's most interesting to you, Census, is to think about the lineage from an end to end perspective. So

the way we think about this today is

there are conventions to follow, and then there are, you know, monitoring that we can provide. Right? So

there are 2 things that I tend to recommend to people.

1

is ensuring that you have kind of some level of delineation within your warehouse of

kind of source data

that is being pulled in from an ingestion tool, you know, kind of

somewhat staged transformations, and then

some

relatively clear set of, let's call it, datamarts

that you are providing out to arbitrary tools. Right? And then

we natively connect to DBT. Right? So we can see the entire kind

of model life cycle within DBT. We can see the dependencies

between them

in the warehouse and can monitor kind of how tables are being populated. And so

the way I think about this is like, if you have a clean delineation,

we can come in and

tell you

why

something

is

skewing, right, let's call it, or suddenly things are no longer

updating in a destination, or this field suddenly has a bunch of nulls, right, or a bunch of duplicates.

These are the things that people deal with all the time. So so we tend to prevent those failures, and then you can kind of walk up the chain of

transformations to understand why these things are happening. But

to your question about

how do I tighten the feedback loop between a metric

in the CRM and a metric another CRM that is feeding, you know, that end to end, which I think is kinda what you're getting at.

I'd say we don't do that very well yet, but it's definitely the kind of thing that we're always thinking about, which is,

can we even build

a straight up feedback loop on a metric? Can you put an optimization goal on a metric and, you know, use the source metrics to feed it?

Technologically,

Census is able to do that. We just don't have, like, kinda, like, a UI for that today. But I think that's the right place to go. I like what you're thinking. And so now digging more into the Census platform itself, can you talk through some of the ways that it's implemented and the overall sort of architecture

implementation details of how you're able to

build this system for being able to pull from the warehouse and push into these operational

approachable and accessible to people

approachable and accessible to people throughout the organization and not just the, you know, very technical users in the data team.

So Sensus

connects to warehouses in kind of the standard way you'd expect. Right? You give it a role and and and access to the warehouse.

You can scope as a data team, you can scope Sensus' access to a set of schemas.

That way

you can have more users

in the tool so that you can say, like, look, there there's schemas that are

just used by the low level data team to, you know, transform stuff. There's a lot of staging data in here. Like, we don't want people to kind of accidentally

pull this into a destination system because it's it's really not production ready.

And so you can scope in census, you can say, I want

to expose these things, right, these sets of schemas, these sets of views, tables, etcetera.

And that way you can have a more of a self-service kind of experience for the somewhat SQL savvy, but like reasonable, you know, users on the other end that just want to

pull the data that they need and put in the tools that they're using. So that's kind of how you connect Census. And Census is a SaaS tool, so it's there's nothing to deploy on your side.

And in fact,

it stores 1 of the things we built from the 1st day of the product

is that we don't store any of your data.

All of the data that matters lives on your warehouse.

And that's 1 of my favorite features of kind of modern cloud data warehouses is that we can almost we use them as a backing store for Census so that if you cut the cord, we have nothing from your company.

And in a world of, you know, this kind of secular push towards having more data

ownership in companies and a lot of privacy infrastructure that's being developed in every company that we work with, like, this is a really important kind of aspect of the tool.

It's important and invisible. Right? So it's like something I'd like to tell people.

And then the way census works, right, is you

choose models, right? We natively connect to dbt. So you can like choose models out of your dbt

project. You can

write a query, you know, you can just pick a view and then

you have this

what you see is what you get, you know, kind of point and click ability to map that to

all these destinations, the way we connect to those destinations using their API.

So we have standard credentials into those tools,

and, you know, we can read your destinations and pull in all of the unique features of your company. So believe it or not, you know, like those destination products, whether that's Salesforce or Zendesk or you name it, people have customized them. Right? They have added their fields. They've added their rules. And

all of that, we pull in and surface

in our mapping UI so that you can determine how you wanna map things. Right? And the biggest feature here that, again, is under the covers is that the biggest, most important thing for Census is not just to get your data out into those tools.

Because that is hard. Don't get me wrong. Because, like, these APIs are

super complicated

and fail in terrible ways. So just getting the data across is very, very hard,

but you wanna get the data out correctly. Right? And so to your point about data quality,

the core kind of feature here is that we want to make sure you're not pushing bad data. And that comes in 2 ways. 1 is we sit at the end of your transformation flow.

So you can ensure that a sync consensus only occurs

upon

successful completion of like a DBT test. Right? So that's, like, built into our capability.

But beyond that, we have built an understanding of every 1 of these tools to make sure you don't push data that is invalid by that, by the marketing team's definition or the sales team's definition. So for example,

it might be very reasonable for the data team

for users to have

a null value on pricing. Like you can have a pricing plan column and it it's null, right? Like for the data team, that might be a very valid

schema. Right?

But that might might be acceptable

for someone on the who's using this data on the marketing team or on the sales team, and they need something else. They either need to cut those people out from the query or

assign it a default value of something else, And that is what Census allows you to do and guarantee that you don't screw up. Right? It's those kinds

of last mile validations, is what I'd say, that census

takes care of.

And then the rest is the kind

of let's call it ETL

expectations that everyone should have, right? It's like, it always runs, it runs on a schedule. It can be triggered by workflow tools, our orchestration tools.

We monitor everything, right, so we'll tell you when things break, we'll tell you when things partially break, so you have different levels of breakage

in this world,

and things can break for all sorts of reasons, right? Someone can delete the field that you're mapping in Salesforce and suddenly, like, the data can't be pushed anymore, right? We'll notify people for those things.

You can notify in, you know, Slack and email, all these kinds of things you'd expect at this point. And the goal here is, you know, Census is a bridge product. Right? It's a bridge between the data team and the, let's call it, marketing ops, sales ops, you know, CS ops teams, all these kind of business operational teams.

And so they'll get alerted too. And a lot of times,

it's their job to fix it. Right? So take something as simple as you've taken a table

of leads and you're syncing it into Salesforce

opportunities. Right?

And

a sales ops person changed

some of the

schemas in Salesforce, thus breaking

the sync. Right? And I'd love to be able to make Salesforce participate in

the data lineage tooling that we've built. Right? But we're not there yet. So what will happen is you'll change Salesforce and Census will alert you. Right? You'll get whenever the next kind of iteration of our syncs are, you'll get an email, you'll get an alert in your learning tool, which is whatever monitoring tool you use, and it'll say, hey, sync is broken because

this field was deleted,

and you have to fix it. You can either delete the mapping,

update the field, right, whatever, resurrect the field, whatever needs to be done. And

very much, this is not just the responsibility of data team. Otherwise, data teams would really not wanna use census.

If everything related to breakage in every tool went to them, I think that would not be

palatable.

So very much the goal here is to say, when the marketing tool

breaks a sync, the marketing ops team is notified, and the data team just in case, you know, they should be aware. Right? But there's often a directionality of who should fix things when things are broken, and all of that is kinda built into the platform.

And as far as the sort of collaboration aspect, you mentioned that the, you know, the sales team or the marketing team can say that, you know, this value should never be null. This is the default that I want. What is their interface for being able to work with the data team to be able to set set those rules

and maintain them in Census?

So there's a couple of things, and certain sales teams or marketing teams use products that can enforce that, which is nice. So the correct native way to do this is for them to go into,

let's say, Salesforce. Right? And you can set fields. Right? Salesforce is really just a

CRUD database UI.

And so

you can set fields to have these rules, right? You can say this is a non null field. And if you do that,

then Sensus takes care of the rest. We detect that rule

and prevent that sync from succeeding and thus, like, you might not even be able to save the sync if you're trying to sync incorrectly or you'll get alerted for all the rows that fail

the test, right? So Census basically

takes in all the knowledge that you've put in there

and computes that, right? You could almost think of it as a compiler, right? It's taking what it knows about the destination, the source,

every single row, and then determines which rows, you know, fit the mold.

And then you'll get alerted for the rows that don't. And, again, you can have the sales team or sales ops team be alerted or the data team be alerted. So that's the most common way people do this is, like, push the logic into a tool that can implement that logic.

And then if not,

then you'd wanna be able to build

that rule as effectively a DBT test, which, you know, some sales ops teams can't do that quite yet.

So it kind of spans those 2 worlds. And then we're working on some things to make that a little more

fun. That that's what we're cooking up still.

Struggling with broken pipelines,

stale dashboards,

missing data?

If this resonates with you, you're not alone.

Data engineers struggling with unreliable data need look no further than Monte Carlo, the world's first end to end, fully automated data observability

platform.

In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem with broken data

pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence,

reducing the time to detection and resolution from weeks or days to just minutes.

Start trusting your data with Monte Carlo today. Visitdataengineeringpodcast.com/impact

today to save your spot at Impact, the data observability summit, a half day virtual event featuring the first US chief data scientist,

the founder of the data mesh, the creator of Apache Airflow, and more data pioneers

spearheading some of the biggest movements in data.

The first 50 people who RSVP will be entered to win an Oculus Quest 2.

From the side of the data team, I'm wondering what are some of the sort of common

data modeling

tasks that are useful or necessary

to be able to effectively

leverage something like Census and some of

the communications that they need to have with the sales and marketing teams to understand what is the context of the information that they're working with? What are the questions that we need to be able to answer as we're delivering the data into these different tools?

This is huge. I actually think this is a lot of the work to be done. I think

over the next, still, couple years, I think that's where investments will pay off the most for data teams here.

Because

historically, right, and I I don't mean ancient history, but, like, you know, relatively recent history,

the primary goal of a data team beyond moving to, you know, a cloud data structure,

which is like a means to an end, was to provide BI. Right?

And

most KPIs

in BI

are aggregated

at the whole kind of company level. Right? So how much revenue did we make

this quarter? Or

by cohort,

how much, you know, revenue did we make? And what is our,

you know, net retention, etcetera, etcetera.

But for operational analytics, you actually have to move away from

overall aggregated data

and into

cohorted all the way down to the individual

data.

And that's the biggest

change data teams need to make in their data models.

That can be a significant shift. Right? Because you might be, let's call it your unique company IDs or user IDs might be disappearing pretty quickly in your data transformations

as you get to, you know, taking all your invoice data and just aggregating it into how much money do we make.

You might not care from which customer. Right? And so that's probably

where, you know, there's real work to be done that cannot be, you know, I can't do that work for you. I don't think we're yet at the point where, you know, magic census AI can cohort your data down to the individual.

So that's where data teams spend their time to better work with census, but more probably to better work with their sales and marketing teams, which is everything they do is at the individual. Right? Individual could be a company or a user. Right?

So you have to think about that. What is a user? Is a user

an email, or is a user a database ID, but more importantly, is a user

the same across 2 SKUs of your product? What if you have 3, you know, modules in your product that people can independently sign up for? What is a user then? Right? Do you treat it as 1 across all 3 or not?

And the marketing team you wanna work with on this, because you as a data team might invent

25 ways in which you do that. But the marketing team is telling you, I actually wanna write an email

that is not in triplicate even though we have 3 products

when we announce this. But I'd also like to be able to email, you know,

people individually about that 1 product, or I wanna be able to tell them about product C when they're using product A and B, which means you have to have a single user that spans all the products,

which is real work, right? To deduplicate them, find a key that kind of covers all 3.

This is where people usually fall back on email anyway. But that might not be correct.

So that's the work to be done. And without that, the marketing team will not be able to access your data correctly. And then you're back to square 1 that we talked about, which is, like, the data is incorrect. Right?

Not because of, you know, Census, the tool breaking, but because you're not aggregating at the right level for the use case.

So that's super, super important. And I'd say most companies are still kind of early figuring in that. A lot of our users that have been long time census users, like, so, you know, we've got companies that have been using census for 2, 3 years. They are pretty good at this because out of necessity. Right? But I think if all you've ever done is BI, this is gonna be the area where you need to improve. Same thing for companies. What is a company?

Is it a workspace in your app as a company? Is a company a collection of workspaces in your app? Does your app have workspaces?

Is it tied to an email domain,

etcetera? Right? You have to figure these things out because without that,

you don't have the right key for aggregating

those KPIs.

Because once you do, you can then say, okay, well, let's look at revenue by company, let's look at usage by company, let's look at most active user in company.

And so that involves communication with the sales team, with the marketing team, with the support team, because that's who needs this fed correctly. And if you get this wrong for them, then they won't use your data at all because the KPIs will be wrong, and then you're back to square 1.

So, yeah, you should get in the room. Now what I recommend is don't try to boil the ocean. Right? Start small.

Get in the room. A lot of marketing and sales teams might not think the data team is there to help them, but you can come in with a single KPI.

And then

from there,

you have now a relationship you're helping, and then you can start to listen. You can start to ask correct questions. And data teams are really good at, you know, the 5 whys and, like, kind of getting to the precision of what do you mean as a user? And like, a lot of times,

let's call it marketing sales support teams don't know how to express it, they just kind of feel it. And so, you know, a dated person kind of getting in the conversation there to figure out what exactly do you mean by a company, and then I will distill that into

a set of SQL transformations

is super important and super useful.

And that's a lot more fun, I would say, as a task

and much more leveraged than just

waiting for them to

email you about, like, a broken dashboard and then, like, you know, trying to figure out what the bug is in the dashboard. Right? It's a lot more interesting to think about these

fundamental pieces

of which all other KPIs can be kind of attached.

Yeah, I think that's probably the largest work to be done. The rest is just creating good work,

good kind of

collaboration

processes.

Here, it's like, don't get exotic, you know, use a bug tracker.

You know, If you can find tools that span the 2 teams, like, something like Census, then great. Right? Because then they can use it and they get alerted and you use it and you get alerted and it's like everyone's happy. But,

yeah, you should probably have some kind of bug tracker, you should have some kind of standing meeting, you should have maybe a wiki, when you ship new KPIs. 1 of our customers is this company Loom, that does videos, right? Their whole shtick is like async video.

And

their data team, whenever they ship a new KPI,

or a new table, depending, right, they will ship a video with it, so that the team can learn what the heck this KPI is. And they find that a little video is like, it's like a little explainer video, and it's, like, super useful.

So these are the ways you can improve your collaboration, and you have to figure out how to do this in a leveraged way. Right? So it's like you don't wanna have a meeting with every single person on the sales team to explain your latest, you know, product usage KPI that's generating leads.

Maybe a little Wiki, maybe a little video. These are all the hacks that I think I've seen people use that are super impactful.

And to your point of releasing a new KPI or a new table, what is the workflow for a team that's using Census, and what are the opportunities for being able to do some sort of CICD workflow of being able to actually validate the changes that you're making before you go live with them?

Census is actually a CD tool. Right? It is a deployment tool. What it is, it's just, again, given

kind of the evolving change, like, way in which our the data world and the business world are learning things, like, you know, you kinda don't lead with that right away.

But that is what I think

the job of Census is, is to take data, turn it into

a product, right, which is that you're deploying, which is well tested.

And it is a product because the whole definition of product is in a way is that, like, it's not a single use. Right? It's not a single dashboard that you're making. It's

potentially a single KPI that is touching 10 tools. So it's very important that you kind of validate, test it, and and scale it out. So the way Census works is

if you build a new KPI,

we automatically

ensure a few things. Right? 1 is you can think of syncs themselves in Census as entities that go through a kind of draft stage commit kind of deploy

set of states. You're able to edit

mappings and see what would be the effect. Would this change the schema in Salesforce

because you've now added a new field, or is that just going to change the type definition? Like, you can see all those things.

And then so you can kinda get a preview of the change that you will affect.

Right? Then there is the

compatibility test. Right? So we will prevent you from

saving a sync, right, from deploying, committing a sync that has broken types, like, so you're trying to map, you know, a day time into a Boolean, like, that's not gonna work. Right? So those are all built in. And then you could think of those as like compile time checks. Right? And then there is runtime checks. So

let's say you add a new KPI,

but you didn't realize that some of the time, like I said, it could be null. Right? But we will prevent you from syncing those nulls. Right? So, like, those rules are embedded in census. And so at the end, you'll get, like, an alert that says, hey. A 1, 000, 000 rows, you know, were synced, but out of those, like, 55, 000,

like, failed the null test, and so those were not sent over. And now it's, you know, it's up to you to deal with that, right, to do with it as you will. Maybe that's important. You just cut them out. They were never meant to be part of the query,

or you you fix it. Right? In terms of your overall ideas and assumptions about what would be involved in actually building the Sensus product and what was needed in the operational analytics space, what are some of the assumptions that you had going into this business and going into this venture that have been challenged or changed as you started to iterate on the product and work with your customers and understand more of the nuance of what's actually involved in building this project? I think

when we first started out,

the first thing that we had to

tackle is that it was not

expected in any way for the data team to be

involved

in

this side of the house.

That was a real kind of,

I'll call it, existential dilemma in the 1st

year of census. No 1 remembers these things. But in, like, 2019, it wasn't

normal

to have the data team in the mix.

And the companies that use census in back then were very sophisticated,

I'll call it. Like, both sophisticated and not super big. So there was a lot of trust still in the company, right, across teams and across the data team and and and the rest of the team.

And how to scale that out at some levels, like was something we had to learn. How do we teach other companies that this is okay? That not only is it okay, it's actually the right thing. Right? So I think that was the first assumption that we that we had to kind of question that we thought this was intuitive, but the customers didn't think this was intuitive, right, to say,

you know, like, the warehouse

could be in the mix here. So then there were 2 problems to that. Right? 1 is marketing teams are going, like, why would I work with my data team? Like, I have my own thing.

And then

data teams intuitively and, again, it's hard to remember this, but it's, like, the idea of the warehouse as a source rather than a sync for data

was bizarre in 2018, 2019. It's truly bizarre. They're like, that's not what a warehouse is for, which is true. Historically, that's not what a warehouse was for. But you had to see what, you know, like, the way the directions in which Snowflake was going. Right? And, like and BigQuery and everybody else to be able to be operational hubs. Right? It was starting to become possible. But if you grew up in CS or in data, like, yeah, the warehouse was not meant to do this at first, and we've come a long way to making that possible. So those those are a lot of things to overcome was to say, hey, you know, it's actually super cost effective and totally reasonable

to treat the warehouse as a operational hub, rather than just, you know, a sink, to to to analyze, like, 1, 000, 000, 000 of rows at the end of the quarter. And then finally,

I would say

there is a lot

of misunderstanding

and missed expectations

around what people call real time.

This was another thing we had to kinda tackle, which is like, when you're first working with a marketing team

or a marketing operations team, they're used to everything happening

instantaneously, and they get used to that.

Because the original ways they've wired these tools is to be, you know, kind of instantaneous.

But

they're often

getting bad data because of that. Right? Because they're not aggregating it correctly, and they're not getting all the cleanup facilities that the data team has done. But then when you connect Census to your warehouse and out to a marketing tool, there is a delay now. Right? It's not true real time, not capital r real time. And

the good news is that's evolving, that's changing, that's improving.

I think we're gonna see the world of real time and warehouse like, batch continue to kind of collide over the next few years, which is great.

But in the early days,

this was again, an assumption we had to fight. Right. Which

is the data will be arriving slower. Right. And that's not okay.

And I had to kind of

teach users and we had to build facilities in census, what I'll call, like,

kind of fast and slow data. And so we created 2 kinds of pipelines because your entire warehouse and DBT workflow cannot be run-in, like, sub second kinda latencies. It's just not possible.

And so I kind of taught users and both built features into Census that say, here's, you know, 80%, 90% of your data that comes in at the slow speed, which is still can be like sub minute,

but not sub second. And then if you need sub second, here's, you know, your alternate path, and the census can participate in that so that you can make sure you're still getting some unification.

But that was a lot

of early kind of fights with users was was kind of helping them understand. And again, some of this is technical. Like we had to build things and some of it was helping them understand

their needs. So like, oh, you need real time. What do you mean by real time?

And turns out, like, if you tell them,

how about 60 seconds?

Oh, that's perfectly fine.

So you see, you you might have spent, like,

months trying to build real time capability when all they needed was, you know, a minute, which is marketing real time. Do you know what I mean? So that was a lot of the assumptions that we had to to kind of debunk for ourselves in the early days. Continuing on the question of real time and streaming and being able to push the data as it's being generated, I'm wondering what you see as the potential future for streaming architectures in the space of operational analytics and some of the technical

and organizational

barriers that constrain that from being something that we're realizing today? Yeah. I think the fundamental constraint today is

true real time

and

is on a separate stack altogether.

And so

if you wanna have remember we talked about this? Like, we said, like, hey. You need a unified definition of, like, what is a user or what is a company. Right?

Those transformations

ideally should not be duplicated because it's really hard to maintain.

And some of those

may even require,

you know, batch data to be able to determine, you know, what those aggregations are.

And so

that's probably the fundamental limitation. It's like, you can do real time, but it's, like, partially independent. Right?

There are bridges between the 2 you can create, but it's kind of independent.

And so

that's what people do in census and in the market today. They kind of have like a fast path and a regular path. And then the goal is to try to

keep

this logic between them as simple and as separate as possible. So every time you make a fix, you don't have to go fix it in 2 places.

I think over time, the best thing that can happen here is,

like, for all the warehouses to continue

to improve their real time capability, right, their streaming capability, whether that's through

streaming materialized views with products like materialize

or just faster ingestion,

slick streaming ingestion of data on the warehouse. Like, all these things will get us closer. I think there's a speed of light problem, of course. Right? So I think engineering teams

may always rely on a different stack for kind of pub subsystems at high scale. But I think most of the rest of the business

as the warehouses and some of these tools emerge that kind of bring streaming to those tools, to that stack,

we'll probably get good enough to get, you know, within a second or a few seconds. That should get us what we need.

In terms of your experience of working with your customers and seeing the applications of Census, what are some of the most interesting or innovative or unexpected ways that you've seen it used? Oh, that's a great question. It's like you keep seeing new ways people use it.

So

1 of the neatest things I've seen.

So you can think of the first couple scenarios people do when they get started, right. Is

take 1 or 2 KPIs, you know, like

usage of your product. It's like a classic, the sales and marketing team always want to know exactly how people are using the product.

And it might be that like some of the most basic KPIs, like how many times have they used the features?

Or

how many times have they logged into the product? Like, sometimes it's really basic KPIs like that. But it's a signal. Right? And so you add that to a user

or a company in all those tools, and then your teammates no longer have to open 2 tools to figure out what's going on, and they can kind of talk to the customer in a better way.

So that's kind of like level 1. But what you see then people do once they have this kind of power in hand,

they start to

shift

what I'll call logic

that is very

proprietary

and imperative

in tools like

Salesforce, Marketo, etcetera.

And they actually move the logic

into

SQL and in the warehouse, which is both

more standard, more open of a language, right, than learning Apex, which is the programming language of Salesforce,

and more functional rather than imperative. Right? It's like, it's not do this and then do this and do that. Like, they have these workflow builders and process builders in Salesforce that are very, like,

imperative code in a way, even if they have UI.

But SQL

is kind of in ways, aside from Excel, is maybe the world's most

widely deployed functional language. Right? And what I find our customers doing is taking

hand tuned processes that they wrote in Salesforce and moving them into the warehouse because they have their own playpen. Right? Like, there's the data team has their playpen of, like, here's how you generate the people, data mart, etcetera.

And then you might have a sales ops person who's here and getting more technically savvy, and their job is to automate everything in the sales side. Right? Like, they have to do all these things,

and they'll start to

create

really interesting business

workflows in SQL. So it's not at all what you think of as

what a data team would do, which is to build a KPI. What they're doing is saying,

well, I wanna do an assignment rule

and Salesforce gives you like convoluted ways to do this. And it's like hard to maintain, but it exists.

But then they're like, wait, the inputs to that rule are already in the warehouse.

And so

why don't I just express that as a model, a transform,

and just generate a new, you know, column

in the data? And then that's how they'll do assignment. There's 1 user I saw recently do this, and it's super neat. Right? So they're doing work assignment

on a salesperson

in SQL.

So they have all the sales team in SQL, because it's all ingested, and they have all these KPIs, all these things, and based on that, basically they're taking a bunch of rows and saying, this is the sales owner, this is the sales owner, this is the sales owner, and they're doing that in SQL.

It's super neat, right? It's more maintainable.

If you fix it, it just works because it like is gonna recompute the table as a whole. If you change your rules, it's like, you don't have to figure out what do I need to rerun?

It's like, there's no such thing in census. Right? It's like, if you change the state, if you change the table, we will

synchronize. That's the whole point.

So that's probably 1 of the most unusual, interesting things I've seen recently. And in your experience of building the business and growing the product and helping to

be 1 of the defining companies in this space of operational analytics, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

Ultimately, you know, I think

tools

are about making people, you know, more awesome versions of themselves. Right? And so I think a lot of the interesting problems are

in people and team design and company design. And so,

you know, going back to your first

question,

I think the ultimate way to draw the value of Census is that it creates a feedback loop. Right? It's not just saying, here's put some KPIs in Salesforce.

It's

build a complete feedback loop from

the action

to the source data and back. Right? And

the hardest thing

is

helping

data teams and executive teams

understand

that they should invest in this approach.

And

this is gonna piss some people off, but I'd say the biggest problem that I've come across and that continues to be a huge, huge, huge

pain and will be a pain for a while is that most data teams

report to a CFO, or most business intelligence teams report to a CFO.

Nothing against CFOs.

But, you know, their job is, you know, kind of compliance and accounting and, like, getting the numbers right, and which is very important, and that's why the data teams have the best data. But

it's not like a growth team, which

is all about feedback loops and, you know, increasing

the velocity and scale of a company. And I think that's what data teams should be modeled after. They should be modeled

after growth teams, and they should be bigger than them and more powerful than them and more valuable than them, because growth teams are hyper focused on, like, product growth and, like, you know, AB testing the website. And if you're Facebook, that's basically everything. At most companies, data teams have the broadest purview.

You know, if you can get the executive team to realize that this is an investment

and this team should be 1 of your most valuable teams,

then we've done our work. I think, like, that's actually

a lot of ways. The job I have as the company grows is to help

companies realize this kind of shift

and it's nontrivial. Right? It's like,

historically, a data team was never in the hot seat. Right? It was never

in the critical path of business action. It was always a

let's reflect back on how things went last week or last month.

I'm excluding

data teams that are inside the engineering work. Right? So, you know, the data team at Uber was really building, you know, petabyte Kafka streams and, like, you know, those I I consider different, but that's a very small set of companies that operate that way. The broad set of companies in the world don't operate that way. And so my goal is just to try to figure out how to help them see the data team as foundational

and as, like, almost like a platform rather than an end user.

And for people who are interested in exploring the space of operational analytics and being able to build those complete feedback cycles, what are the cases where Census is the wrong choice and they might be better suited either with 1 of the other off the shelf tools or building their own homegrown systems?

I think

there's a magical time when you're relatively small

as a company where,

at least on the operation side, most of your data actually sits in 1 tool.

And

it's kind of a golden age, like, if you can do that. Right? Like, there's a period of time when, like, you can almost say everything's in intercom.

You know? The at my last startup, like, there was a while there where there was no need for a warehouse. There was no need for those things.

95% of what we wanted to do was just in Intercom, whether that's, you know, messaging them or like all of our metrics were in there, It was really, really powerful.

That obviously

breaks

down inevitably. Right? Like, either because your business becomes more complex or your scale, 1 of the 2. And then so the first thing I would say is like, if your company,

if your product is still simple enough that most of it can sit in a tool like that,

then you should not focus on

almost any aspect of quote unquote, the modern data stack, you should focus on, you know, getting users and improving activation, improving retention, etcetera.

And And then once you have that, of course, you're gonna invest in the modern data stack. Right?

So when it comes to

the the build versus buy decisions there,

I think

it's becoming increasingly

unlikely that people should

build themselves, should write this code themselves.

I think

there's always scenarios. Right? So if your tools are very

bespoke, but even there, you know, we we support custom destinations in census. And so, like, you could just use our framework to connect to your custom stuff, and then you'd still get all of our monitoring, all of our incrementality, all these things that you really want.

So I think historically, like, there were reasons to build this yourself, but I think they're diminishing.

I mean, historically, another reason to build this yourself would have been, you know, compliance and kind of

security practices. But again, the way Sensus is designed,

we don't store your data. You know, we're

SOC 2 and HIPAA compliant, all these things.

The data is homed exactly where you want it to be. So it's like we're not even

shifting what locales your data is stored in.

So there's a lot of a lot of those reasons go away as well. I guess if you have on prem, if you have a mix of cloud and on prem, I would say maybe get off on prem, but I think that's a reasonable

situation in which you should probably do some of this yourself. But you should still be trying to build this feedback loop. I would say that, like,

people should not wait to build those feedback loops by getting on the cloud first. But I have seen companies for whom it's faster to just say, screw it. Let's move to the cloud, and then we'll be able to use these off the shelf tools.

But that's another good reason that you might wanna build this yourself.

Yeah. I mean, those are probably some of the reasons I would state for doing this. And then maybe you have extremely

intense real time requirements.

And so for that, you need to put this into your kind of real time infra and census doesn't fit into your real time infra.

Again, that means though, that it's almost certainly the engineering team owns this rather than the broad data team. And that means you're also not dealing with a classical data warehouse.

So you're already in a pretty different situation there. And again,

I think a lot of companies get enamored with that and they don't realize the burden of managing that is actually very, very, very high. But but that is another probably case where you might wanna do it homegrown.

And as you continue to build out the product of Census, what are some of the things you have planned for the near to medium term? So, you know, the core

goal of the company is to help, you know, businesses do more with their data. Like, today,

you know, like I said, we're trying to help these data teams who have made these investments in moving to the cloud, who have implemented great BI stacks and like, get them to serve those insights directly to the business team. Right? Like, that's what we're trying to do today is like, just extract so much more value out of the work that data teams have been doing. And then I think there's a couple of things that we wanna work on next. 1 is, like, enable more nontechnical people to take action on that data. Right? So you and I talked a bunch about this and, like, how do we help those people kind of come in and participate

and participate in such a way that they're not just, like, consuming the charts, but they're actually generating

new insights themselves and hold KPIs that they care about all of which fits in the lineage

of the whole data.

And just ensuring that that you never screw up, right, that you never push bad data and really focus on those kinds of things.

And, you know, beyond that, like, you know, let's talk again next year.

Are there any other aspects of the overall space of operational analytics and the work that you're doing at Census that we didn't discuss that you'd like to cover before we close out the show? I think we covered most of it. I think the

key idea here that I think we got across together is that

operational analytics

is I don't even know if space is the right term. It's really like

a direction and outcome

for data teams. Right? Is to say, you are no longer

just on the receiving end of all the,

let's call it, all the crap that comes down from management and from other teams, and you are now actually

driving

certain

functional teams in the company.

I have yet to meet a data team that isn't

frustrated

by

the ratio

of time that they're reactive

to needs of their stakeholders

versus

proactive about building

new interesting functionality on the data team. And that's the goal here is to say, let's move you from reactive

further along the spectrum towards proactive

and help you kind of

impact the teams directly and then be, you know, idea generators for those teams. And I think that's

really the journey that we're on here with our customers is how do we help make that happen? Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. I think the most exciting thing continues to be, like I said, reducing the latency in the warehouse stack. I think

if latency stays high, there will always be 2 stacks.

And so I would really like, you know, Snowflake, Databricks, BigQuery, and all these all these folks to

continue to invest in that as they understand that even though they're analytical databases,

right, that were not originally intended to be used for low latency workflows.

Now that they can see the light that they're like there's an opportunity for these warehouses to be

hubs around the entire business

is just continuing to tighten the latency

on those tools, and I think that would have a huge impact on the entire market. Well, thank you very much for taking the time today to join me and share the work that you've been doing at Census and your perspective on the overall space of operational analytics and the potential benefits that it can have for organizations. It's definitely a very interesting space and 1 that I'm excited to see continue to grow and evolve. So thank you for all of the time and energy you've put into that and your work at Census, and I hope you enjoy the rest of your day. Thank you.

Listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com

to learn about the Python language, its community, and the innovative ways it is being used. And visit the site at dataengineeringpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the

the show, please leave a review on Itunes and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Links