Summary
The focus of the past few years has been to consolidate all of the organization’s data into a cloud data warehouse. As a result there have been a number of trends in data that take advantage of the warehouse as a single focal point. Among those trends is the advent of operational analytics, which completes the cycle of data from collection, through analysis, to driving further action. In this episode Boris Jabes, CEO of Census, explains how the work of synchronizing cleaned and consolidated data about your customers back into the systems that you use to interact with those customers allows for a powerful feedback loop that has been missing in data systems until now. He also discusses how Census makes that synchronization easy to manage, how it fits with the growth of data quality tooling, and how you can start using it today.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/impact today to save your spot at IMPACT: The Data Observability Summit a half-day virtual event featuring the first U.S. Chief Data Scientist, founder of the Data Mesh, Creator of Apache Airflow, and more data pioneers spearheading some of the biggest movements in data. The first 50 to RSVP with this link will be entered to win an Oculus Quest 2 — Advanced All-In-One Virtual Reality Headset. RSVP today – you don’t want to miss it!
- Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
- Your host is Tobias Macey and today I’m interviewing Boris Jabes about Census and the growing category of operational analytics
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Census is and the story behind it?
- The terms "reverse ETL" and "operational analytics" have started being used for similar, and often interchangeable, purposes. What are your thoughts on the semantic and concrete differences between these phrases?
- What are the motivating factors for adding operational analytics or "data activation" to an organization’s data platform?
- This is a nascent but quickly growing market with a number of products and projects operating in the space. How would you characterize the current state of the segment and Census’ position in it?
- Can you describe how the Census platform is implemented?
- What are some of the early design choices that have had to be refactored or augmented as you have evolved the product and worked with customers?
- What are some of the assumptions that you had about the needs and uses for the platform which have been challenged or changed as you dug deeper into the problem?
- Can you describe the workflow for a customer adopting Census?
- What are some of the data modeling practices that make it easier to "activate" the organization’s data?
- Another recent trend in the data industry is the growth of data quality and data lineage tools. What is involved in using the measured quality or lineage information as a signal in the operational systems, or to prevent a synchronization?
- How can users test and validate their workflows in Census?
- What are the options for propagating Census’ runtime information back into lineage and data quality tracking?
- Census supports incremental syncs from the warehouse. What are the opportunities for bringing streaming architectures to the space of operational analytics?
- What are the challenges/complexities in the current set of technologies that act as a barrier?
- What are the most interesting, innovative, or unexpected ways that you have seen Census used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Census?
- When is Census the wrong choice?
- What do you have planned for the future of Census?
Contact Info
- Website
- @borisjabes on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Have you ever woken up to a crisis because a number on a dashboard is broken and no 1 knows why? Or sent out frustrating Slack messages trying to find the right dataset? Or tried to understand what a column name means? Our friends at Outland started out as a data team themselves and faced all this collaboration chaos. They started building Atlan as an internal tool for themselves. Atlan is a collaborative workspace for data driven teams, like GitHub for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets and code, Atlant enables teams to create a single source of truth for all of their data assets and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker, and more.
Go to dataengineeringpodcast.com/outland today. That's a t l a n, and sign up for a free trial. If you're a data engineering podcast listener, you get credits worth $3, 000 on an annual subscription. When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster. With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.
Go to data engineering podcast.com/linode today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy. And today, I'm interviewing Boris Jabez about Census and the growing category of operational analytics. So, Boris, can you start by introducing yourself? Yeah. Hi. Thanks for having me. So, yeah, I'm the CEO over at CenSys,
[00:02:09] Unknown:
where we've been building what people call a reverse ETL for last few years. We were founded in 2018. I'm a lifelong kind of tool builder. Before this, I ran a company that actually helped people manage passwords and and and kind of deal with employee identity. And before that, I worked on Visual Studio, you know, godfather of programming tools. And do you remember how you first got involved in data management? Yeah. So in 2017 and 2018, there's a couple of things that were both kind of happening to me or to my team and to, and that we were kind of working with people on. We are kind of always obsessed with customer data in some form or another.
And I think the earliest kind of signal or experience we had that caused us to to build Census was actually, a few years back, I sold my last company and started working across Boston and San Francisco at the time, and collaborating with, you know, sales and marketing people that were not of my team. Right? It's very standard thing that happens in an acquisition. And it wasn't going that well. And I did the classic thing at the time, which is to blame the culture and the people because that's easy. And the truth is, like, we were a product led startup. We were very like self serve software. You know, we had an operator console and all these things sitting in San Francisco. I could train everyone on my team to be able to look stuff up.
And the truth was that, you know, these other sales and marketing folks didn't have access to that. And so they didn't know how to message our customers. They didn't know how to do, you know, useful reach out based on what they were doing. And so that's actually where the background thread for Census began, which was like, maybe the issue is not just cultural, but it's actually a connection problem between product and, let's call it, analytics teams versus sales, marketing, and support teams, and how, like, they live in different tools and maybe there's something there that's missing. And so that's like the, let's call that the, the germ of the truth of census, right? Like that's where, like, it kind of began. And then as I started digging into this, you know, it really just hit a weird part of my brain that is like completely obsessed with what I'll call certain kinds of data federation problems.
So my first company and census are both actually solving a very similar problem, which is that in the world of SaaS, in the world of, like, having now expanded into the Internet and everyone's using software as a service for the last 10 plus years. Right? We now have just an explosion of software as a service inside a company. I think 1 of our customers has over 300 apps that they use, right? Independent web apps that there's all these, like, things that will occur because of that, that, that are chaotic and, actually make things worse. So 1 is, you know, security and logging in and employee identity. That was my first company. And then customer data is just kind of replicated independently in all these products.
And so, whereas in my brain went to, you know, this necessary, better design, which is that like, we need this centralized and federated somehow because you really can't have a plethora of tools dealing with the same nouns and schemas, but yet having them rebuilt from scratch independently and then wired together kind of ad hoc. And so that's how Census was born, and that's why we named the product Census, right, to solve that that problem. So that was about 2017, 2018.
[00:05:45] Unknown:
And so for people who aren't familiar with Census, I'm wondering if you can just give a bit more of a detail on what it is that you're building there. And you've already given some of the story behind how it came about, but maybe where it sits in this overall sort of nascent space of operational analytics versus reverse ETL.
[00:06:02] Unknown:
So Sensus at its simplest is a tool that connects to, data warehouse, a cloud data warehouse, and allows you to extract models, insights, metrics that you have in your warehouse, and sync them seamlessly into tools that people use to actually do their work every day. So Salesforce, Marketo, Zendesk, etcetera. So this is, you know, kind of going beyond what people do in a BI tool where you're looking at data and you have really amazing charts and actually pushing data directly into systems of action, whether that's a salesperson or a marketer or a support person are actually doing their work every day in their pane of glass. Right? So that's what Census does. And this is what people colloquially would like now call reverse ETL as a tool. You can think of that as like how that works is about connecting to a warehouse and like moving data seamlessly out. Right. So that's why this term reverse ETL emerged.
I don't like to obsess about like the fact that it's kind of a misnomer since ETL has no direction, but that's roughly the kind of, reverse ETL means. It's the tool, the process by which you get data from a warehouse seamlessly, efficiently, you know, incrementally out to destinations that, you know, the tools that people use to do their work. Now operational analytics is like, you could think of that as, like, what that enables, what the methodology that comes out of using a tool like this, which is the idea that analytics is actually at the center of all your operations rather than 2 independent entities in a company. That's actually what's important here about what we do. It's less moving the bits around, actually, and much more about helping data teams position themselves kind of at the center of their company and specifically at the center of all these kind of functional workflows that are driving the business.
That's kind of what I think of as, like, my life mission or the company's life mission, which is to say, instead of a data team being this quarterly, like, retroactive look back at, you know, how much revenue did we make, instead, they should be drivers of, you know, how do we increase revenue or how do we improve our support or how do we better reach our new customers, right, etcetera. And data teams are the only team pretty much in the company, pretty much, right, that have access to all the information that's happening, like, from inside and outside. Right? And so they're actually uniquely suited to being the broker to enable all this. You know, generally, I find it frustrating to watch our users or prospective users be stuck kind of just dealing with the request of the week, request of the day. Please fix this chart. Please make me another chart. And I think they have just a lot more to offer because they can think in terms of what are interesting KPIs. They can invent new KPIs and then push them out to the rest of the organization.
And not just pushing them through a picture that people look at in a board meeting, but actually push them directly all the way into the, you know, the pane of glass that every single person sits in in the company. Yeah. And 1 of
[00:09:11] Unknown:
the recent trends, maybe within the past 5 years in business intelligence is that there's been the recognition that just seeing the numbers tell you something isn't really useful because then you have to actually take some action based on that information. And if there's no way within that context to be able to actually take that action, then it become the the the whole sort of feedback cycle of loop becomes disconnected, and you have, you know, multiple different activities and no concrete way to tie them all together and be able to actually complete that loop and have those feedback cycles. And I'm wondering what you're seeing in terms of the space of operational analytics and its relation to that idea of actionable intelligence and not just business intelligence and maybe some of the ways that you're able to actually use something like Census to push this data into these operational systems and have that feedback, that cycle into the business intelligence dashboards and the other sort of key performance indicators that the business is tracking?
[00:10:09] Unknown:
Yeah. So, I mean, I don't like to hit people over the head with, you know, the kind of Gartner style feedback loop picture that you would draw here, but you're completely right. That is the goal, right, is to improve your feedback loops. And that's the goal of arguably all business and most human endeavor is to try to get better at it through iteration. And so how do you do that? Right? You do that by, like, tightening the iteration time. Right? Most products and companies that perform better is because their iteration time is shorter. So that way you can get a feedback loop quicker. Right? This is why we even invented all these, like, AB testing tools for websites. It's like to get the loop happening faster. And 2 is you gotta, you know, wire the entire thing together such that you can actually measure an objective function that you can improve. Right.
And so, you know, I think the state of the art at most companies before we came around was work very hard to come up with KPIs, which is kinda like an objective function that are not trivial to build. Right? Because they you have to clean the data correctly. You have to define the KPI precisely. Right? And then the loop was maybe do a project and then come together in time, like, let's say at the end of a week or at the end of a quarter, to observe the KPI and then determine, did what we do affect that or not? And and was it causal or not? And then have a kind of, like, this human debate and then decide whether we should adjust direction or not. Right. And in census, I have seen this like firsthand. It has shifted a lot of these things because what you're doing is something more like, let's take 1 of our classic kind of scenarios that people use census for, which is in these kind of high scale software companies, they have tons of free users. Right? Tons and tons of free users, like millions, if not tens of millions of free users.
And you can't literally pick up the phone and call every single 1, right, to say, hey, are you interested in our, like, enterprise feature? You know, like you can't do that. This is simply not scalable. And so you're gonna create some kind of metric, right, for determining which users might be interesting to call. So that's a kind of query. Right? That's a kind of KPI. It's actually often lots of variables come into play. Right? It'll be things like how actively are they using features, ABC, it'll be, how big are they? It'll be, how big is their company? Right. All sorts of, you know, there's things that are a 100% bespoke to our customer, right? Not anything we could build a priority.
And you'll then tell the sales team, like, hey, these are the people to call. Right? And then you'll hope that that works out. But with Senses, what people do is they'll take that model, right, of out of our free users, who is attractive, who do we deem to be attractive, and automatically sync that, right, to their sales team. So that literally generates leads and work opportunities in kind of sales parlance. And so, like, there's a list of people to start calling. Right? It's it's just the salesperson can just go. They don't have to wonder even, like, why this user is at the top of their list. It'll tell them. Right? It'll say, they're active in this way. This is the champion at that company and so on and so forth. And so they'll just start making the calls. And because of course you're re ingesting all that CRM data and billing data back into the warehouse.
You can track on a cohorted basis, on a time basis, how are we converting the leads that we generate? The data team can do all that and do that in an automated way. And so you can start to tweak, like, realize that, you know what? Our win rates are actually pretty bad, but if you slice it by employee count, it's really good above this number. So just tweak the KPI to say, okay, company's greater than this size. And now you've increased your win rate, you've reduced the busy work and so on and so forth. Right? Or vice versa, you could kind of expand the spigot if things are going really, really well. And all of that is happening with almost no human intervention with no meeting required and no like strife between, you know, the data team and the sales team. It's just, we can see what converts.
And there's a lot more you could do here right now. Once you have all that wired, there's actually more that people could do than they are doing. And, like, I've got tons of ideas for them to improve that even further, right? But that's the kind of feedback loop you want, and that's just 1 out of, you know, dozens that people are able to achieve here. So when we first went out to market, if you go back to our old kind of materials from, wow, almost like a year and a half, more than a year and a half ago, it was very much about this idea that we're the last mile in the data stack because it completes that feedback loop. Right? It's like, you have data that perfectly, you you know, ingest data from tools like Salesforce.
And what we do is take the work that the data team does, which is massive and complex and, like, pushes it back out, and now you have a loop now. Right? I just don't like pitching a loop to end users because it's not what they're thinking about in the morning.
[00:15:01] Unknown:
And so digging deeper into this concept of reverse ETL and operational analytics, you mentioned how you take issue with the term reverse ETL because of the fact that there is no implied directionality to it. But, you know, historically, there has been 1 direction as you mentioned, you know, from your operational tools into your central data system so that you can then perform analysis and put it up on a pretty dashboard and job's done. I'm wondering if you can just talk through some of your thoughts on why sort of the past couple of years have been the time period where operational analytics and this whole category of tooling has kind of spontaneously sprung up from a number of different sources.
[00:15:39] Unknown:
So when Census was born in 2018, you know, we just provided this tool that connected to a warehouse and, you know, pushed data out to Salesforce, etcetera. And our users kind of spontaneously refer to it as a reverse Fivetran, which I think is a very reasonable metaphor because it had the ease of use of Fivetran. Right. And it did go in the opposite direction. So I think that's a very reasonable description. And then as the market grew, people started to use this, like, broader term, which is slightly less precise, but the core idea is still there. But really, like I said, reverse detail is just like, it's the how. Right? It's just like how bits move. I think the rise of of operational analytics, which I'd like to think we, you know, we helped catalyze, but what's gonna happen regardless is that teams are trying to get more value out of their data. Like the early years of the let's call it the modern data stack was 1st and foremost, just getting to the cloud. Right? Step 1, get to the cloud, which these transformations take time. Right? Snowflake took a long time to become what this obvious thing that we now see today, but it was like, that's an overnight success over in 10 years. Right.
And so I think that was the majority of what people were trying to solve for. And then getting to efficient BI is I think we're still in the journey there, like, in terms of having every company have really, really brilliant BI, both in terms of visualizations and the kind of organization of the data itself to be more useful to the broader set of people in the company. But I think once you've made those investments as a data team, you start thinking about how to have more leverage. Right? And I think we came in at a really great time for a lot of companies where they had small data teams who had made kind of these amazing investments in cloud data infrastructure that gives you a lot of capabilities that are not too hard to deploy and manage. And we gave them this tool that was also very easy to manage.
That meant you didn't have to ask an engineer to to kind of wire up data for you and gave you dramatically more leverage. I'd like to think that in the arsenal of data tools, we might be 1 of the ones with the highest amount of like individual leverage, right? Where you take a single data analyst or an analytics engineer, whatever you wanna call them today. And you give them this tool and they can suddenly impact the entire sales team, the entire marketing team, the entire support team. And this is what we saw happen at our customers. Like, the clout of our users went dramatically up. They started being invited to meetings they were never in. They started to kind of turn off some of the homegrown systems that, you know, marketing teams had set up because they were wrong.
But it made sense for them to exist because the marketing team was craving this information and didn't have the data team working with them, so they just did it on their own. And so I've seen data teams that we work with just first start to add a little bit into these destination tools and then slowly but surely just take over the entire pipeline, both making the company as a whole, like, systems cleaner because it's all going through 1, you know, central clearinghouse that is the warehouse, and increases, like I said, the power of the data team, and that's been kind of really exciting to watch. But I'd say that's why you see the rise of operational analytics because people have invested in great infrastructure and they wanna get more out of it. To your point of the, you know, homegrown solutions
[00:19:18] Unknown:
often being wrong, I'm wondering if you can talk to some of the ways that, you know, what are the other rising sort of nascent categories of data quality tooling is able to factor into the space of operational analytics and being able to validate the flows that are being pushed into these operational tools and be able to perform these quality checks in some of the ways that things like the investment in lineage tracking and being able to understand, you know, what are the upstream and downstream impacts of these transformations, how that factors into the ways that census and operational analytics is being applied in the organization.
[00:19:51] Unknown:
1st, let's not minimize this. The ways in which data is wrong in a company are myriad. And the first thing that I was pointing out was there are people dealing with data outside of the data team, and their errors are, you know, plentiful and outside of the kind of, let's call it, control of the data team. The word data lineage doesn't exist in that side of the house. So I think the first step I've tried to help companies achieve, right, is to say your marketing team or your sales team is actually built little micro data stacks of their own. They just don't manage them well. And they make mistakes with their data that are, that are less bugs and more straight up misunderstandings of the data. So they aggregate data differently than you do because they don't know what the correct aggregation is.
You know, it takes time and expertise to understand, let's call it, dates and, like, revenue over time, and it takes an accounting team getting angry at you to build the correct revenue KPI that you might not do as a, you know, quick and dirty solution when you're connecting, like, your, you know, Stripe to Salesforce directly or something. And so the first thing is by connecting, you know, the data stack out to the operational tools, you first benefit from bringing the data team's expertise and correctness about metrics out to, you know, those teams. So I think, like, let's not underestimate that, like, because all those teams don't really know how to build KPIs at all correctly, and they'll approximate them, whereas in the data team at least semantically is conscientious, I'll call it, about building these KPIs.
So the first thing you'd still wanna do is, like, invest in a tool like ours to get that data out. But then something very interesting occurs, which is what you're pointing out, which is if the data team is going to be in the hot seat for providing information into these systems, then the impact of being wrong on a metric goes up. This is the whole beauty of leverage. It's it's a double edged sword. Right? You magnify every KPI's impact on the company. Therefore, if it's wrong, it's gonna be much worse. And so I think as the data stack has evolved to be more central to a company's operations, we've seen not just tools like Census emerge to connect it to that world, but also data quality and testing and lineage tools emerge to help make sure that data teams don't screw up, right, to make sure that data doesn't become wrong.
And, you know, I think 1 of my friends over at 1 of those companies makes a good joke that, like, if you've never provided that data to someone, you haven't been in business long enough. Right? You haven't been a data person long enough, which I think is very true. And so there's a lot to this. Right? There's, like, what kinds of errors are we trying to catch, and what are the kinds of failures that occur? And so even here, I would differentiate between 2 categories of errors. There's errors in metrics themselves that a lot of the data observability tools are helping companies manage, which I think is awesome.
So it might be due to the fact that your ingestion layer has failed to bring data in. It could be that your query became wrong because it didn't factor in, like, certain nulls that started appearing in the data. Right? There's a lot of ways in which your metrics can become incorrect. Right? And that's what these observability tools solve for. And the lineage just helps you in my mind, like, debug that. Right? But then there's a whole other category of areas that I think tools like Census actually solve for, which may not be as obvious to people, but I think that is, like, our job to be done here, which is there are a lot of ways in which your data quality goes down, but not because your query is wrong or the shape of the data has changed. It's that the wiring is not working correctly. Right? Or there's implicit wiring at a company. And so this is what we often see that people replace. Right? So, like, you have your homegrown script that, you know, does not here's I'll give you a great example. Like, a lot of destination systems, they're not like a warehouse. They they actually have a lot of transient weird failure.
They can reject data for a lot of reasons. They might reject you putting data into it because, you know, a name field is too long, stuff like that. Right? And they'll just reject the entire, like, row of data, and you won't necessarily realize that. So you're starting to skew between your source of truth, which is the warehouse, and these destination systems. And that is something where a great tool like ours can make a big difference because that's what we just track end to end. Right? We can signal to you that there's skew kind of being generated between your destinations and your source of truth, which is the warehouse. And that's kind of data quality in the day to day in the wild. Right? Like, this is how you end up having bad data quality. It's not just that your KPIs have tests, whether that's DBT tests or using tools like, you know, Monte Carlo and Bigeye, etcetera.
It's because you're actually not getting every single data point to where it needs to go. And that's kind of what we wanna make sure it never happens. Right? So we'd like it to be so that if your metric is correct and you have observability tools to make sure your metrics stay correct, that you can be assured that everything on the constellation, every satellite of the warehouse has that data. And if it doesn't, we will also kind of participate in what I'll call, like, the modern data stack kind of observability tooling, which is to alert you for when these things are starting to get out of sync. Another interesting element of this whole sort
[00:25:26] Unknown:
of paradigm of closing the loop and this question of data quality is because of the fact that you're pushing data into these operational systems and then pulling it back out again to be able to do things, like you said, of closing the loop to see, you know, how are our, you know, categorizations of potential clients panning out in terms of how it actually performs and being able to close that iteration cycle. You also have the data quality in the ingestion layer where you're pulling data from those operational systems into your warehouse. How are you able to maybe link up those, you know, in and out patterns of being able to actually reuse some of these data quality checks to ensure that the data you're pushing back into these systems adheres to the checks that you're trying to enforce on its way in.
[00:26:10] Unknown:
Totally. Totally. I think your mind is exactly in the right place. I think people talk about data lineage for good reason. Most people think about it very zoomed in on inside the warehouse. Right? Like, you have an airflow task that does this, and you have a DBT run that generates some more models, etcetera. And so you you have a lineage of how this this occurs. But I think census is actually really, really well suited to what I'll call like end to end lineage, which is what you're talking about, which is that a column changes. And that gets ingested by Fivetran.
And it ends up through a series of transformations feeding a field all the way back in your CRM. Right? And we sit in such a place that, like, we can see the data arrive in the warehouse and watch it as part of a a larger kind of transformation workflow and then push it back into the destination. So that's actually where it's most interesting to you, Census, is to think about the lineage from an end to end perspective. So the way we think about this today is there are conventions to follow, and then there are, you know, monitoring that we can provide. Right? So there are 2 things that I tend to recommend to people.
1 is ensuring that you have kind of some level of delineation within your warehouse of kind of source data that is being pulled in from an ingestion tool, you know, kind of somewhat staged transformations, and then some relatively clear set of, let's call it, datamarts that you are providing out to arbitrary tools. Right? And then we natively connect to DBT. Right? So we can see the entire kind of model life cycle within DBT. We can see the dependencies between them in the warehouse and can monitor kind of how tables are being populated. And so the way I think about this is like, if you have a clean delineation, we can come in and tell you why something is skewing, right, let's call it, or suddenly things are no longer updating in a destination, or this field suddenly has a bunch of nulls, right, or a bunch of duplicates.
These are the things that people deal with all the time. So so we tend to prevent those failures, and then you can kind of walk up the chain of transformations to understand why these things are happening. But to your question about how do I tighten the feedback loop between a metric in the CRM and a metric another CRM that is feeding, you know, that end to end, which I think is kinda what you're getting at. I'd say we don't do that very well yet, but it's definitely the kind of thing that we're always thinking about, which is, can we even build a straight up feedback loop on a metric? Can you put an optimization goal on a metric and, you know, use the source metrics to feed it?
Technologically, Census is able to do that. We just don't have, like, kinda, like, a UI for that today. But I think that's the right place to go. I like what you're thinking. And so now digging more into the Census platform itself, can you talk through some of the ways that it's implemented and the overall sort of architecture
[00:29:20] Unknown:
implementation details of how you're able to build this system for being able to pull from the warehouse and push into these operational approachable and accessible to people approachable and accessible to people throughout the organization and not just the, you know, very technical users in the data team.
[00:29:42] Unknown:
So Sensus connects to warehouses in kind of the standard way you'd expect. Right? You give it a role and and and access to the warehouse. You can scope as a data team, you can scope Sensus' access to a set of schemas. That way you can have more users in the tool so that you can say, like, look, there there's schemas that are just used by the low level data team to, you know, transform stuff. There's a lot of staging data in here. Like, we don't want people to kind of accidentally pull this into a destination system because it's it's really not production ready. And so you can scope in census, you can say, I want to expose these things, right, these sets of schemas, these sets of views, tables, etcetera.
And that way you can have a more of a self-service kind of experience for the somewhat SQL savvy, but like reasonable, you know, users on the other end that just want to pull the data that they need and put in the tools that they're using. So that's kind of how you connect Census. And Census is a SaaS tool, so it's there's nothing to deploy on your side. And in fact, it stores 1 of the things we built from the 1st day of the product is that we don't store any of your data. All of the data that matters lives on your warehouse. And that's 1 of my favorite features of kind of modern cloud data warehouses is that we can almost we use them as a backing store for Census so that if you cut the cord, we have nothing from your company.
And in a world of, you know, this kind of secular push towards having more data ownership in companies and a lot of privacy infrastructure that's being developed in every company that we work with, like, this is a really important kind of aspect of the tool. It's important and invisible. Right? So it's like something I'd like to tell people. And then the way census works, right, is you choose models, right? We natively connect to dbt. So you can like choose models out of your dbt project. You can write a query, you know, you can just pick a view and then you have this what you see is what you get, you know, kind of point and click ability to map that to all these destinations, the way we connect to those destinations using their API.
So we have standard credentials into those tools, and, you know, we can read your destinations and pull in all of the unique features of your company. So believe it or not, you know, like those destination products, whether that's Salesforce or Zendesk or you name it, people have customized them. Right? They have added their fields. They've added their rules. And all of that, we pull in and surface in our mapping UI so that you can determine how you wanna map things. Right? And the biggest feature here that, again, is under the covers is that the biggest, most important thing for Census is not just to get your data out into those tools. Because that is hard. Don't get me wrong. Because, like, these APIs are super complicated and fail in terrible ways. So just getting the data across is very, very hard, but you wanna get the data out correctly. Right? And so to your point about data quality, the core kind of feature here is that we want to make sure you're not pushing bad data. And that comes in 2 ways. 1 is we sit at the end of your transformation flow.
So you can ensure that a sync consensus only occurs upon successful completion of like a DBT test. Right? So that's, like, built into our capability. But beyond that, we have built an understanding of every 1 of these tools to make sure you don't push data that is invalid by that, by the marketing team's definition or the sales team's definition. So for example, it might be very reasonable for the data team for users to have a null value on pricing. Like you can have a pricing plan column and it it's null, right? Like for the data team, that might be a very valid schema. Right?
But that might might be acceptable for someone on the who's using this data on the marketing team or on the sales team, and they need something else. They either need to cut those people out from the query or assign it a default value of something else, And that is what Census allows you to do and guarantee that you don't screw up. Right? It's those kinds of last mile validations, is what I'd say, that census takes care of. And then the rest is the kind of let's call it ETL expectations that everyone should have, right? It's like, it always runs, it runs on a schedule. It can be triggered by workflow tools, our orchestration tools. We monitor everything, right, so we'll tell you when things break, we'll tell you when things partially break, so you have different levels of breakage in this world, and things can break for all sorts of reasons, right? Someone can delete the field that you're mapping in Salesforce and suddenly, like, the data can't be pushed anymore, right? We'll notify people for those things.
You can notify in, you know, Slack and email, all these kinds of things you'd expect at this point. And the goal here is, you know, Census is a bridge product. Right? It's a bridge between the data team and the, let's call it, marketing ops, sales ops, you know, CS ops teams, all these kind of business operational teams. And so they'll get alerted too. And a lot of times, it's their job to fix it. Right? So take something as simple as you've taken a table of leads and you're syncing it into Salesforce opportunities. Right?
And a sales ops person changed some of the schemas in Salesforce, thus breaking the sync. Right? And I'd love to be able to make Salesforce participate in the data lineage tooling that we've built. Right? But we're not there yet. So what will happen is you'll change Salesforce and Census will alert you. Right? You'll get whenever the next kind of iteration of our syncs are, you'll get an email, you'll get an alert in your learning tool, which is whatever monitoring tool you use, and it'll say, hey, sync is broken because this field was deleted, and you have to fix it. You can either delete the mapping, update the field, right, whatever, resurrect the field, whatever needs to be done. And very much, this is not just the responsibility of data team. Otherwise, data teams would really not wanna use census.
If everything related to breakage in every tool went to them, I think that would not be palatable. So very much the goal here is to say, when the marketing tool breaks a sync, the marketing ops team is notified, and the data team just in case, you know, they should be aware. Right? But there's often a directionality of who should fix things when things are broken, and all of that is kinda built into the platform.
[00:36:10] Unknown:
And as far as the sort of collaboration aspect, you mentioned that the, you know, the sales team or the marketing team can say that, you know, this value should never be null. This is the default that I want. What is their interface for being able to work with the data team to be able to set set those rules and maintain them in Census?
[00:36:29] Unknown:
So there's a couple of things, and certain sales teams or marketing teams use products that can enforce that, which is nice. So the correct native way to do this is for them to go into, let's say, Salesforce. Right? And you can set fields. Right? Salesforce is really just a CRUD database UI. And so you can set fields to have these rules, right? You can say this is a non null field. And if you do that, then Sensus takes care of the rest. We detect that rule and prevent that sync from succeeding and thus, like, you might not even be able to save the sync if you're trying to sync incorrectly or you'll get alerted for all the rows that fail the test, right? So Census basically takes in all the knowledge that you've put in there and computes that, right? You could almost think of it as a compiler, right? It's taking what it knows about the destination, the source, every single row, and then determines which rows, you know, fit the mold.
And then you'll get alerted for the rows that don't. And, again, you can have the sales team or sales ops team be alerted or the data team be alerted. So that's the most common way people do this is, like, push the logic into a tool that can implement that logic. And then if not, then you'd wanna be able to build that rule as effectively a DBT test, which, you know, some sales ops teams can't do that quite yet. So it kind of spans those 2 worlds. And then we're working on some things to make that a little more fun. That that's what we're cooking up still.
[00:37:59] Unknown:
Struggling with broken pipelines, stale dashboards, missing data? If this resonates with you, you're not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world's first end to end, fully automated data observability platform. In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem with broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing the time to detection and resolution from weeks or days to just minutes.
Start trusting your data with Monte Carlo today. Visitdataengineeringpodcast.com/impact today to save your spot at Impact, the data observability summit, a half day virtual event featuring the first US chief data scientist, the founder of the data mesh, the creator of Apache Airflow, and more data pioneers spearheading some of the biggest movements in data. The first 50 people who RSVP will be entered to win an Oculus Quest 2. From the side of the data team, I'm wondering what are some of the sort of common data modeling tasks that are useful or necessary to be able to effectively leverage something like Census and some of the communications that they need to have with the sales and marketing teams to understand what is the context of the information that they're working with? What are the questions that we need to be able to answer as we're delivering the data into these different tools?
[00:39:31] Unknown:
This is huge. I actually think this is a lot of the work to be done. I think over the next, still, couple years, I think that's where investments will pay off the most for data teams here. Because historically, right, and I I don't mean ancient history, but, like, you know, relatively recent history, the primary goal of a data team beyond moving to, you know, a cloud data structure, which is like a means to an end, was to provide BI. Right? And most KPIs in BI are aggregated at the whole kind of company level. Right? So how much revenue did we make this quarter? Or by cohort, how much, you know, revenue did we make? And what is our, you know, net retention, etcetera, etcetera.
But for operational analytics, you actually have to move away from overall aggregated data and into cohorted all the way down to the individual data. And that's the biggest change data teams need to make in their data models. That can be a significant shift. Right? Because you might be, let's call it your unique company IDs or user IDs might be disappearing pretty quickly in your data transformations as you get to, you know, taking all your invoice data and just aggregating it into how much money do we make. You might not care from which customer. Right? And so that's probably where, you know, there's real work to be done that cannot be, you know, I can't do that work for you. I don't think we're yet at the point where, you know, magic census AI can cohort your data down to the individual.
So that's where data teams spend their time to better work with census, but more probably to better work with their sales and marketing teams, which is everything they do is at the individual. Right? Individual could be a company or a user. Right? So you have to think about that. What is a user? Is a user an email, or is a user a database ID, but more importantly, is a user the same across 2 SKUs of your product? What if you have 3, you know, modules in your product that people can independently sign up for? What is a user then? Right? Do you treat it as 1 across all 3 or not? And the marketing team you wanna work with on this, because you as a data team might invent 25 ways in which you do that. But the marketing team is telling you, I actually wanna write an email that is not in triplicate even though we have 3 products when we announce this. But I'd also like to be able to email, you know, people individually about that 1 product, or I wanna be able to tell them about product C when they're using product A and B, which means you have to have a single user that spans all the products, which is real work, right? To deduplicate them, find a key that kind of covers all 3.
This is where people usually fall back on email anyway. But that might not be correct. So that's the work to be done. And without that, the marketing team will not be able to access your data correctly. And then you're back to square 1 that we talked about, which is, like, the data is incorrect. Right? Not because of, you know, Census, the tool breaking, but because you're not aggregating at the right level for the use case. So that's super, super important. And I'd say most companies are still kind of early figuring in that. A lot of our users that have been long time census users, like, so, you know, we've got companies that have been using census for 2, 3 years. They are pretty good at this because out of necessity. Right? But I think if all you've ever done is BI, this is gonna be the area where you need to improve. Same thing for companies. What is a company? Is it a workspace in your app as a company? Is a company a collection of workspaces in your app? Does your app have workspaces?
Is it tied to an email domain, etcetera? Right? You have to figure these things out because without that, you don't have the right key for aggregating those KPIs. Because once you do, you can then say, okay, well, let's look at revenue by company, let's look at usage by company, let's look at most active user in company. And so that involves communication with the sales team, with the marketing team, with the support team, because that's who needs this fed correctly. And if you get this wrong for them, then they won't use your data at all because the KPIs will be wrong, and then you're back to square 1. So, yeah, you should get in the room. Now what I recommend is don't try to boil the ocean. Right? Start small.
Get in the room. A lot of marketing and sales teams might not think the data team is there to help them, but you can come in with a single KPI. And then from there, you have now a relationship you're helping, and then you can start to listen. You can start to ask correct questions. And data teams are really good at, you know, the 5 whys and, like, kind of getting to the precision of what do you mean as a user? And like, a lot of times, let's call it marketing sales support teams don't know how to express it, they just kind of feel it. And so, you know, a dated person kind of getting in the conversation there to figure out what exactly do you mean by a company, and then I will distill that into a set of SQL transformations is super important and super useful.
And that's a lot more fun, I would say, as a task and much more leveraged than just waiting for them to email you about, like, a broken dashboard and then, like, you know, trying to figure out what the bug is in the dashboard. Right? It's a lot more interesting to think about these fundamental pieces of which all other KPIs can be kind of attached. Yeah, I think that's probably the largest work to be done. The rest is just creating good work, good kind of collaboration processes. Here, it's like, don't get exotic, you know, use a bug tracker. You know, If you can find tools that span the 2 teams, like, something like Census, then great. Right? Because then they can use it and they get alerted and you use it and you get alerted and it's like everyone's happy. But, yeah, you should probably have some kind of bug tracker, you should have some kind of standing meeting, you should have maybe a wiki, when you ship new KPIs. 1 of our customers is this company Loom, that does videos, right? Their whole shtick is like async video.
And their data team, whenever they ship a new KPI, or a new table, depending, right, they will ship a video with it, so that the team can learn what the heck this KPI is. And they find that a little video is like, it's like a little explainer video, and it's, like, super useful. So these are the ways you can improve your collaboration, and you have to figure out how to do this in a leveraged way. Right? So it's like you don't wanna have a meeting with every single person on the sales team to explain your latest, you know, product usage KPI that's generating leads. Maybe a little Wiki, maybe a little video. These are all the hacks that I think I've seen people use that are super impactful.
[00:45:42] Unknown:
And to your point of releasing a new KPI or a new table, what is the workflow for a team that's using Census, and what are the opportunities for being able to do some sort of CICD workflow of being able to actually validate the changes that you're making before you go live with them?
[00:45:57] Unknown:
Census is actually a CD tool. Right? It is a deployment tool. What it is, it's just, again, given kind of the evolving change, like, way in which our the data world and the business world are learning things, like, you know, you kinda don't lead with that right away. But that is what I think the job of Census is, is to take data, turn it into a product, right, which is that you're deploying, which is well tested. And it is a product because the whole definition of product is in a way is that, like, it's not a single use. Right? It's not a single dashboard that you're making. It's potentially a single KPI that is touching 10 tools. So it's very important that you kind of validate, test it, and and scale it out. So the way Census works is if you build a new KPI, we automatically ensure a few things. Right? 1 is you can think of syncs themselves in Census as entities that go through a kind of draft stage commit kind of deploy set of states. You're able to edit mappings and see what would be the effect. Would this change the schema in Salesforce because you've now added a new field, or is that just going to change the type definition? Like, you can see all those things.
And then so you can kinda get a preview of the change that you will affect. Right? Then there is the compatibility test. Right? So we will prevent you from saving a sync, right, from deploying, committing a sync that has broken types, like, so you're trying to map, you know, a day time into a Boolean, like, that's not gonna work. Right? So those are all built in. And then you could think of those as like compile time checks. Right? And then there is runtime checks. So let's say you add a new KPI, but you didn't realize that some of the time, like I said, it could be null. Right? But we will prevent you from syncing those nulls. Right? So, like, those rules are embedded in census. And so at the end, you'll get, like, an alert that says, hey. A 1, 000, 000 rows, you know, were synced, but out of those, like, 55, 000, like, failed the null test, and so those were not sent over. And now it's, you know, it's up to you to deal with that, right, to do with it as you will. Maybe that's important. You just cut them out. They were never meant to be part of the query,
[00:48:02] Unknown:
or you you fix it. Right? In terms of your overall ideas and assumptions about what would be involved in actually building the Sensus product and what was needed in the operational analytics space, what are some of the assumptions that you had going into this business and going into this venture that have been challenged or changed as you started to iterate on the product and work with your customers and understand more of the nuance of what's actually involved in building this project? I think
[00:48:29] Unknown:
when we first started out, the first thing that we had to tackle is that it was not expected in any way for the data team to be involved in this side of the house. That was a real kind of, I'll call it, existential dilemma in the 1st year of census. No 1 remembers these things. But in, like, 2019, it wasn't normal to have the data team in the mix. And the companies that use census in back then were very sophisticated, I'll call it. Like, both sophisticated and not super big. So there was a lot of trust still in the company, right, across teams and across the data team and and and the rest of the team. And how to scale that out at some levels, like was something we had to learn. How do we teach other companies that this is okay? That not only is it okay, it's actually the right thing. Right? So I think that was the first assumption that we that we had to kind of question that we thought this was intuitive, but the customers didn't think this was intuitive, right, to say, you know, like, the warehouse could be in the mix here. So then there were 2 problems to that. Right? 1 is marketing teams are going, like, why would I work with my data team? Like, I have my own thing.
And then data teams intuitively and, again, it's hard to remember this, but it's, like, the idea of the warehouse as a source rather than a sync for data was bizarre in 2018, 2019. It's truly bizarre. They're like, that's not what a warehouse is for, which is true. Historically, that's not what a warehouse was for. But you had to see what, you know, like, the way the directions in which Snowflake was going. Right? And, like and BigQuery and everybody else to be able to be operational hubs. Right? It was starting to become possible. But if you grew up in CS or in data, like, yeah, the warehouse was not meant to do this at first, and we've come a long way to making that possible. So those those are a lot of things to overcome was to say, hey, you know, it's actually super cost effective and totally reasonable to treat the warehouse as a operational hub, rather than just, you know, a sink, to to to analyze, like, 1, 000, 000, 000 of rows at the end of the quarter. And then finally, I would say there is a lot of misunderstanding and missed expectations around what people call real time.
This was another thing we had to kinda tackle, which is like, when you're first working with a marketing team or a marketing operations team, they're used to everything happening instantaneously, and they get used to that. Because the original ways they've wired these tools is to be, you know, kind of instantaneous. But they're often getting bad data because of that. Right? Because they're not aggregating it correctly, and they're not getting all the cleanup facilities that the data team has done. But then when you connect Census to your warehouse and out to a marketing tool, there is a delay now. Right? It's not true real time, not capital r real time. And the good news is that's evolving, that's changing, that's improving.
I think we're gonna see the world of real time and warehouse like, batch continue to kind of collide over the next few years, which is great. But in the early days, this was again, an assumption we had to fight. Right. Which is the data will be arriving slower. Right. And that's not okay. And I had to kind of teach users and we had to build facilities in census, what I'll call, like, kind of fast and slow data. And so we created 2 kinds of pipelines because your entire warehouse and DBT workflow cannot be run-in, like, sub second kinda latencies. It's just not possible. And so I kind of taught users and both built features into Census that say, here's, you know, 80%, 90% of your data that comes in at the slow speed, which is still can be like sub minute, but not sub second. And then if you need sub second, here's, you know, your alternate path, and the census can participate in that so that you can make sure you're still getting some unification.
But that was a lot of early kind of fights with users was was kind of helping them understand. And again, some of this is technical. Like we had to build things and some of it was helping them understand their needs. So like, oh, you need real time. What do you mean by real time? And turns out, like, if you tell them, how about 60 seconds? Oh, that's perfectly fine. So you see, you you might have spent, like, months trying to build real time capability when all they needed was, you know, a minute, which is marketing real time. Do you know what I mean? So that was a lot of the assumptions that we had to to kind of debunk for ourselves in the early days. Continuing on the question of real time and streaming and being able to push the data as it's being generated, I'm wondering what you see as the potential future for streaming architectures in the space of operational analytics and some of the technical
[00:53:06] Unknown:
and organizational barriers that constrain that from being something that we're realizing today? Yeah. I think the fundamental constraint today is
[00:53:15] Unknown:
true real time and is on a separate stack altogether. And so if you wanna have remember we talked about this? Like, we said, like, hey. You need a unified definition of, like, what is a user or what is a company. Right? Those transformations ideally should not be duplicated because it's really hard to maintain. And some of those may even require, you know, batch data to be able to determine, you know, what those aggregations are. And so that's probably the fundamental limitation. It's like, you can do real time, but it's, like, partially independent. Right? There are bridges between the 2 you can create, but it's kind of independent.
And so that's what people do in census and in the market today. They kind of have like a fast path and a regular path. And then the goal is to try to keep this logic between them as simple and as separate as possible. So every time you make a fix, you don't have to go fix it in 2 places. I think over time, the best thing that can happen here is, like, for all the warehouses to continue to improve their real time capability, right, their streaming capability, whether that's through streaming materialized views with products like materialize or just faster ingestion, slick streaming ingestion of data on the warehouse. Like, all these things will get us closer. I think there's a speed of light problem, of course. Right? So I think engineering teams may always rely on a different stack for kind of pub subsystems at high scale. But I think most of the rest of the business as the warehouses and some of these tools emerge that kind of bring streaming to those tools, to that stack, we'll probably get good enough to get, you know, within a second or a few seconds. That should get us what we need.
[00:54:57] Unknown:
In terms of your experience of working with your customers and seeing the applications of Census, what are some of the most interesting or innovative or unexpected ways that you've seen it used? Oh, that's a great question. It's like you keep seeing new ways people use it.
[00:55:11] Unknown:
So 1 of the neatest things I've seen. So you can think of the first couple scenarios people do when they get started, right. Is take 1 or 2 KPIs, you know, like usage of your product. It's like a classic, the sales and marketing team always want to know exactly how people are using the product. And it might be that like some of the most basic KPIs, like how many times have they used the features? Or how many times have they logged into the product? Like, sometimes it's really basic KPIs like that. But it's a signal. Right? And so you add that to a user or a company in all those tools, and then your teammates no longer have to open 2 tools to figure out what's going on, and they can kind of talk to the customer in a better way.
So that's kind of like level 1. But what you see then people do once they have this kind of power in hand, they start to shift what I'll call logic that is very proprietary and imperative in tools like Salesforce, Marketo, etcetera. And they actually move the logic into SQL and in the warehouse, which is both more standard, more open of a language, right, than learning Apex, which is the programming language of Salesforce, and more functional rather than imperative. Right? It's like, it's not do this and then do this and do that. Like, they have these workflow builders and process builders in Salesforce that are very, like, imperative code in a way, even if they have UI.
But SQL is kind of in ways, aside from Excel, is maybe the world's most widely deployed functional language. Right? And what I find our customers doing is taking hand tuned processes that they wrote in Salesforce and moving them into the warehouse because they have their own playpen. Right? Like, there's the data team has their playpen of, like, here's how you generate the people, data mart, etcetera. And then you might have a sales ops person who's here and getting more technically savvy, and their job is to automate everything in the sales side. Right? Like, they have to do all these things, and they'll start to create really interesting business workflows in SQL. So it's not at all what you think of as what a data team would do, which is to build a KPI. What they're doing is saying, well, I wanna do an assignment rule and Salesforce gives you like convoluted ways to do this. And it's like hard to maintain, but it exists.
But then they're like, wait, the inputs to that rule are already in the warehouse. And so why don't I just express that as a model, a transform, and just generate a new, you know, column in the data? And then that's how they'll do assignment. There's 1 user I saw recently do this, and it's super neat. Right? So they're doing work assignment on a salesperson in SQL. So they have all the sales team in SQL, because it's all ingested, and they have all these KPIs, all these things, and based on that, basically they're taking a bunch of rows and saying, this is the sales owner, this is the sales owner, this is the sales owner, and they're doing that in SQL. It's super neat, right? It's more maintainable.
If you fix it, it just works because it like is gonna recompute the table as a whole. If you change your rules, it's like, you don't have to figure out what do I need to rerun? It's like, there's no such thing in census. Right? It's like, if you change the state, if you change the table, we will synchronize. That's the whole point.
[00:58:21] Unknown:
So that's probably 1 of the most unusual, interesting things I've seen recently. And in your experience of building the business and growing the product and helping to be 1 of the defining companies in this space of operational analytics, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:58:39] Unknown:
Ultimately, you know, I think tools are about making people, you know, more awesome versions of themselves. Right? And so I think a lot of the interesting problems are in people and team design and company design. And so, you know, going back to your first question, I think the ultimate way to draw the value of Census is that it creates a feedback loop. Right? It's not just saying, here's put some KPIs in Salesforce. It's build a complete feedback loop from the action to the source data and back. Right? And the hardest thing is helping data teams and executive teams understand that they should invest in this approach.
And this is gonna piss some people off, but I'd say the biggest problem that I've come across and that continues to be a huge, huge, huge pain and will be a pain for a while is that most data teams report to a CFO, or most business intelligence teams report to a CFO. Nothing against CFOs. But, you know, their job is, you know, kind of compliance and accounting and, like, getting the numbers right, and which is very important, and that's why the data teams have the best data. But it's not like a growth team, which is all about feedback loops and, you know, increasing the velocity and scale of a company. And I think that's what data teams should be modeled after. They should be modeled after growth teams, and they should be bigger than them and more powerful than them and more valuable than them, because growth teams are hyper focused on, like, product growth and, like, you know, AB testing the website. And if you're Facebook, that's basically everything. At most companies, data teams have the broadest purview.
You know, if you can get the executive team to realize that this is an investment and this team should be 1 of your most valuable teams, then we've done our work. I think, like, that's actually a lot of ways. The job I have as the company grows is to help companies realize this kind of shift and it's nontrivial. Right? It's like, historically, a data team was never in the hot seat. Right? It was never in the critical path of business action. It was always a let's reflect back on how things went last week or last month. I'm excluding data teams that are inside the engineering work. Right? So, you know, the data team at Uber was really building, you know, petabyte Kafka streams and, like, you know, those I I consider different, but that's a very small set of companies that operate that way. The broad set of companies in the world don't operate that way. And so my goal is just to try to figure out how to help them see the data team as foundational and as, like, almost like a platform rather than an end user.
[01:01:15] Unknown:
And for people who are interested in exploring the space of operational analytics and being able to build those complete feedback cycles, what are the cases where Census is the wrong choice and they might be better suited either with 1 of the other off the shelf tools or building their own homegrown systems?
[01:01:31] Unknown:
I think there's a magical time when you're relatively small as a company where, at least on the operation side, most of your data actually sits in 1 tool. And it's kind of a golden age, like, if you can do that. Right? Like, there's a period of time when, like, you can almost say everything's in intercom. You know? The at my last startup, like, there was a while there where there was no need for a warehouse. There was no need for those things. 95% of what we wanted to do was just in Intercom, whether that's, you know, messaging them or like all of our metrics were in there, It was really, really powerful. That obviously breaks down inevitably. Right? Like, either because your business becomes more complex or your scale, 1 of the 2. And then so the first thing I would say is like, if your company, if your product is still simple enough that most of it can sit in a tool like that, then you should not focus on almost any aspect of quote unquote, the modern data stack, you should focus on, you know, getting users and improving activation, improving retention, etcetera.
And And then once you have that, of course, you're gonna invest in the modern data stack. Right? So when it comes to the the build versus buy decisions there, I think it's becoming increasingly unlikely that people should build themselves, should write this code themselves. I think there's always scenarios. Right? So if your tools are very bespoke, but even there, you know, we we support custom destinations in census. And so, like, you could just use our framework to connect to your custom stuff, and then you'd still get all of our monitoring, all of our incrementality, all these things that you really want. So I think historically, like, there were reasons to build this yourself, but I think they're diminishing.
I mean, historically, another reason to build this yourself would have been, you know, compliance and kind of security practices. But again, the way Sensus is designed, we don't store your data. You know, we're SOC 2 and HIPAA compliant, all these things. The data is homed exactly where you want it to be. So it's like we're not even shifting what locales your data is stored in. So there's a lot of a lot of those reasons go away as well. I guess if you have on prem, if you have a mix of cloud and on prem, I would say maybe get off on prem, but I think that's a reasonable situation in which you should probably do some of this yourself. But you should still be trying to build this feedback loop. I would say that, like, people should not wait to build those feedback loops by getting on the cloud first. But I have seen companies for whom it's faster to just say, screw it. Let's move to the cloud, and then we'll be able to use these off the shelf tools.
But that's another good reason that you might wanna build this yourself. Yeah. I mean, those are probably some of the reasons I would state for doing this. And then maybe you have extremely intense real time requirements. And so for that, you need to put this into your kind of real time infra and census doesn't fit into your real time infra. Again, that means though, that it's almost certainly the engineering team owns this rather than the broad data team. And that means you're also not dealing with a classical data warehouse. So you're already in a pretty different situation there. And again, I think a lot of companies get enamored with that and they don't realize the burden of managing that is actually very, very, very high. But but that is another probably case where you might wanna do it homegrown.
[01:04:44] Unknown:
And as you continue to build out the product of Census, what are some of the things you have planned for the near to medium term? So, you know, the core
[01:04:52] Unknown:
goal of the company is to help, you know, businesses do more with their data. Like, today, you know, like I said, we're trying to help these data teams who have made these investments in moving to the cloud, who have implemented great BI stacks and like, get them to serve those insights directly to the business team. Right? Like, that's what we're trying to do today is like, just extract so much more value out of the work that data teams have been doing. And then I think there's a couple of things that we wanna work on next. 1 is, like, enable more nontechnical people to take action on that data. Right? So you and I talked a bunch about this and, like, how do we help those people kind of come in and participate and participate in such a way that they're not just, like, consuming the charts, but they're actually generating new insights themselves and hold KPIs that they care about all of which fits in the lineage of the whole data.
And just ensuring that that you never screw up, right, that you never push bad data and really focus on those kinds of things. And, you know, beyond that, like, you know, let's talk again next year.
[01:05:45] Unknown:
Are there any other aspects of the overall space of operational analytics and the work that you're doing at Census that we didn't discuss that you'd like to cover before we close out the show? I think we covered most of it. I think the
[01:05:56] Unknown:
key idea here that I think we got across together is that operational analytics is I don't even know if space is the right term. It's really like a direction and outcome for data teams. Right? Is to say, you are no longer just on the receiving end of all the, let's call it, all the crap that comes down from management and from other teams, and you are now actually driving certain functional teams in the company. I have yet to meet a data team that isn't frustrated by the ratio of time that they're reactive to needs of their stakeholders versus proactive about building new interesting functionality on the data team. And that's the goal here is to say, let's move you from reactive further along the spectrum towards proactive and help you kind of impact the teams directly and then be, you know, idea generators for those teams. And I think that's
[01:06:55] Unknown:
really the journey that we're on here with our customers is how do we help make that happen? Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. I think the most exciting thing continues to be, like I said, reducing the latency in the warehouse stack. I think
[01:07:18] Unknown:
if latency stays high, there will always be 2 stacks. And so I would really like, you know, Snowflake, Databricks, BigQuery, and all these all these folks to continue to invest in that as they understand that even though they're analytical databases, right, that were not originally intended to be used for low latency workflows. Now that they can see the light that they're like there's an opportunity for these warehouses to be hubs around the entire business is just continuing to tighten the latency
[01:07:49] Unknown:
on those tools, and I think that would have a huge impact on the entire market. Well, thank you very much for taking the time today to join me and share the work that you've been doing at Census and your perspective on the overall space of operational analytics and the potential benefits that it can have for organizations. It's definitely a very interesting space and 1 that I'm excited to see continue to grow and evolve. So thank you for all of the time and energy you've put into that and your work at Census, and I hope you enjoy the rest of your day. Thank you. Listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com to learn about the Python language, its community, and the innovative ways it is being used. And visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction to Boris Jabes and Census
The Genesis of Census
What is Census?
Operational Analytics and Actionable Intelligence
The Rise of Operational Analytics
Data Quality and Lineage in Operational Analytics
Linking Ingestion and Operational Data
Implementing Census
Data Modeling and Collaboration
CICD Workflow in Census
Assumptions and Lessons Learned
Future of Streaming Architectures
Innovative Uses of Census
Challenges and Lessons in Building Census
When Census Might Not Be the Right Choice
Future Plans for Census
Closing Thoughts on Operational Analytics