Building A Shared Understanding Of Data Assets In A Business Through A Single Pane Of Glass With Workstream

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

Atlin is the metadata hub for your data ecosystem.

Instead of locking your metadata into a new silo, unleash its transformative potential with Atlin's active metadata capabilities.

Push information about data freshness and quality to your business intelligence,

automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans could focus on delivering real value.

Go to data engineering podcast.com/atlan

today, that's a t l a n, to learn more about how Atlas Active Metadata platform is helping pioneering data teams like Postman, Plaid, WeWork, and Unilever achieve extraordinary things with metadata.

When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode.

With their new managed database service, you can launch a production ready MySQL,

Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs.

Go to dataengineeringpodcast.com/linode

today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show.

Your host is Tobias Macy, and today I'm interviewing Nicholas Freund about Workstream, a platform aimed at providing a single pane of glass for analytics in your organization. So, Nicholas, can you start by introducing yourself? Thanks for having me on the show. I'm Nick Freund. I'm the founder and CEO of Workstream. Io. I started the company really born of

so many pain points I had experienced in my career, first as an analytics person and then later an operator working really closely with our data function.

And do you remember how you first got started working in data?

Yeah. I mean, it really started at the very beginning of my career. I joined as 1 of the first analysts at Tesla,

called Tesla Motors. Back then, about 15 years ago,

very early in the Tesla journey, I joined when there was 200 people at the company pre delivery on the original Roadster.

Anyway, I was there for a long time

supporting the operations and manufacturing

teams, bringing the Tesla Roadster and Model S to market.

And then more recently, I ran operations at a SaaS

here in New York where

I built lots of different functions

and collaborated with and worked really closely with our data team and our business operations team. And we're just a very, very data driven culture, and we experienced

lots of the same problems I had issues with around managing our analytics

assets at Tesla.

Yeah. That's really the germ of what has become

our company.

And so can you describe a bit more about what the Workstream product is and some of the story behind how you decided to invest in actually building a business around it and why this is the problem area that you wanted to spend your time and energy on? 1 way to think about our product is a single pane of glass for your analytics assets. We also frame it as the analytics hub. It really is a place

where teams can bring together

disparate data and analytics assets into a single unified repository.

I mean, that includes, like, not only the the assets themselves, but, like, all of the important and critical business context around it, like documentation

and training content that becomes really useful for

all the other folks throughout the organization to be able to operationalize your assets. And so it serves as that kind of centralized single pane of glass access layer. But then it also then facilitates,

collaboration and workflows all directly in context of your data.

So that's a little bit about

our product.

But, yeah, like, as I mentioned

quickly before, it's really born of the issues I had experienced. Right? And I think the initial

catalyst for me was some very acute pain points around my workflow

as an operator working with our data team. And, you know, as simple as I think everyone's experience, someone sends you, like, a screenshot

of a dashboard in Slack, and you're going back and forth and having a conversation that insights and providing feedback, and it was the very disjointed,

painful, and manual workflow.

But I'd also experienced all of these problems around finding analytics assets that either I had created or others had created and supplied to me, and you can never find what you were looking for. And

even more importantly, you couldn't remember, like, where were we when we were looking at this last? And, like, what were we doing with the data, which it's like those pieces of

not just what the data is, but what is the business doing with it? That is personally what really fascinates me. And,

you know, I felt at the time, and I still do, that

there had been relatively

little investments

by

companies or, by products in kind of solving those last mile issues.

And I felt like there was a real opportunity to kind of solve those key pain points for data teams and, honestly, everyone in the organization who works closely with your data.

And that was really how I got passionate about just feeling these problems myself.

And

I met with other entrepreneurs

who had done similar things, really not in the data and analytics space, but really thinking about

specific work workflows for specific personas within organizations

and and how you could facilitate

workflows kind of around the existing

productivity tools that they already

use, of which, of course, there are so many that data people use from

your warehouse

to whatever you're using for transformations

or data pipelines,

etcetera, etcetera. And so I thought no 1 has really

looked at how do we wrap or extend all of these really powerful technical tools,

kind of the business user, and build something that's truly integrated and dynamic. So,

yeah, that's a quick early story of how I came upon the problem area and then why I decided to dedicate

the next phase of my career to solve some of those problems.

In that context of being able to

build a single repository of information about all of the different data assets that you have and information about them,

from that framing, it sounds similar to some of the different sort of metadata catalog, data discovery platforms that are out there. And I'm wondering what you see as

the missing piece of that approach that leaves out the business users and stakeholders and some of the ways that your work at Workstream

is either

a different approach to that or is maybe a step above

that kind of metadata

catalog, data discovery repository.

With very knowledgeable folks like yourself, like, 1 of the first questions I get is, like, oh, is this a data catalog? Right? And the short answer is no. It's not. You know, know what data catalogs are.

Talked to many, many folks who use them. We have customers who also use data catalogs, and it's

a big problem. And if it's right, we're trying to build something different. And, fundamentally,

the first way to think about it is our perspective really is about

is from the data team to the business and then the business back to the data team. Right?

And some of the things that are unique in kind of what we're doing, as an example, we don't map all of the tables in your data warehouse and help you build out documentation around the columns in a table, as an example. Like, there are reasons that you would wanna go ahead and do stuff like that. But, fundamentally, those are for people who

can write SQL. Right? And can, like, build analytics themselves. And so it's very much, from our perspective, a product that's built

for, really, the technical users or, like, citizen analysts throughout the business.

Salespeople generally aren't gonna have access to your data catalog. Right? Customer success managers or

product managers might, but frontline business folks aren't. They're normally gonna see the output of what the analytics team creates, like the actual data products themselves.

So, really, our integration layer is all around the data products. Right? And that can be everything from

your BI solution and all the dashboards and reports that exist with it or the multiple BI solutions.

We're

very pragmatic that not everything is done to best practice. Right? So, like, what about all those random spreadsheets that are, like, floating around throughout your organization? Right? Or what about

the complex

recommendation and insights that might just live in a document? Right? We treat that as a first class asset as well. What

about operational data that's getting pushed into

various SaaS applications? All of these are what we would call an asset in our system.

And so what we do is really provide a unified repository

at that layer, and then we pick up and help facilitate workflows from there to the business. Right? And so that could be, for example,

a training video. So all of the customer success managers know how to use

the customer 360 dashboard and all the other data assets that are available to them.

And what's unique about our product is it's all directly integrated with the tools that teams are to use.

And so, for example, if you are then in Salesforce looking at a live, you know, data that's been pushed into an account or even some analytics that might have been built natively within Salesforce,

our concierge called the data concierge, it brings critical documentation

and context

into the consumption layer alongside the tools

that teams already use.

So repository is there as an access layer for more mature organizations and almost think of it as a single drive for business users to go ahead and access this stuff. But then they can still engage with the data in the system that they already use, and we're really then augmenting that from a workflow perspective.

So those are some of the ways that we think that we're different, and I think while the way I'll wrap it up is we look at our deployments. Right? We land with, like, very, very technical users normally, like the data teams, the analytics teams. But the majority of our users are nontechnical

users. Right? You end up with the deployment, and there are 100 of folks who, like, have never written, like, a line of code in their entire life. And so it becomes very much this kind

of internal

network around

accessing data and collaborating on it. As far as those interactions to your point, you know, it's the technical users who bring in the product and then the I don't know if nontechnical is necessarily the most appropriate term, but the people who don't have technology as their core focus are the ones who are going to be

interacting with it more predominantly.

And I'm wondering what are some of the types of interfaces

that you're integrating with to be able to provide that interaction and the types of information that those end users are looking for when they decide that, oh, Workstream is the right solution for me because I just wanna know, you know, did this table get updated? Is this spreadsheet

using the most current version of our sales figures? Whatever. You know, just curious if you can talk to some of that kind of user interaction and the types of information that they're looking

to get at and understand as they're doing their day to day job and how you're providing that at Workstream.

Yeah. What's really fascinating to me is just, like, we are seeing more and more about how customers are using a product in ways that we best of we even expect when we built it, despite nature of it. We get teams

really engaged is when it dawns on them that, hey, this is, like, a single place for us to go have access to all of our, like, analytics.

Right? And then I have to, like, shuffle between tabs, and that can be, like, they have to go to the little drive for this 1 thing, and they've gotta go to

the native directory within Tableau.

That's how they're right? Or there's data that lives in Salesforce and bookmarking

that within their browser.

So we're kind of displacing these behaviors

of,

like, where is all of this stuff, and it's fragmented on their process zones. And so it creates kind of lots of those pain points. And so that all now becomes kind of consolidated

in a single place.

With regards to, like, the

interfaces

themselves,

and this is, like, tangentially relevant to the audience, but, like, what's interesting about our product is it's actually incredibly

web development heavy. And so a lot of the complexity of how our product is implemented from the hood is, like, how do we interoperate across all of these user interfaces of, like, different tools?

We have a web app. And so, again, the thing about this is drive that you go to to, like, access contact and documentation. So and

you can find things via library. You can search for all of the the things that you would expect. Teams can build out collections of assets for various teams and end users. And there's some things that you can go ahead and also then view within our product, and there's lots of,

analytics assets that are designed in this way. Not specifically for this use case, but, like, you can view a dashboard in in a web app. But in a lot of cases, that's vast majority of them. That's not how things work. And so in that case, we'll kind of send you back out

to the actual source of Trueplay. It's the actual data tool, and then we have our Chrome extension that lives alongside. They basically bring us our experience

kind of into that native interface.

And so it's kind of complicated under the hood from a development

perspective.

It comes off as pretty seamless

kind of to the to the end users.

Now I think from specific use cases, what we're seeing

a lot, right now

are

data team's finding success using this to

enable

kind of more complex go to market teams, larger go to market teams.

Now you've got an organization

with 100 of folks, supporting customers

or 100 of folks talking out of sales and marketing capacity, and

there's just a lot of manual back and forth or, like, training meeting teams have to set up to teach folks on how to get up to speed and use what's been created for them.

And we can streamline all of that and save data teams as well as everyone else lots of cycles.

And, yeah, I generally

try not to, to your point, refer them as nontechnical folks. Stakeholders of the data team come in all in, like, shapes and sizes and forms. Right? I think it's just like the others. Right? The others in the organization whom you work with. Right? And for us, for our case right now,

you know, those are often kind of these types of folks. And

the use cases start with,

consuming curated knowledge, but then it then extends

to collaboration.

And that could be everything from, hey. Like, I have a question on this, like, 1 specific thing that I'm seeing, and that conversation all happened directly in the context. Right? And teams can extend that with, like, rich annotations

and,

drawing on top of what they're seeing. They can include a video content, a more complex explanation.

And those workflows,

they kinda live across the life cycle of your analytics assets. Right? So it's everything from something that you're building brand new on is in development, and you're working collaboratively

on extended all the way to a different life as well. Or as

the kind of

flow of people's work and the ways that they are interacting with the different data and trying to get insights about the underlying assets.

For people who aren't using something like Workstream, I'm wondering what you see as some of the main points of friction and the sources of fragmentation

in

the availability of that information or some of the challenges that they have to overcome in being able to gain that same level of insight in their work of just being able

to work with the data, understand the

context of the data, sort of the freshness, quality, things like that. You know, there are ways of trying to solve some of these problems that we help you solve. Right?

And

the easiest way to think about the ways teams would solve that is it would introduce kind of new tools, or they would introduce tools into their environment, or they would repurpose tools

to accomplish this. Right?

So

if you're thinking about

documentation,

there's a lot of different ways that you could build out business facing documentation.

That could be like a doc,

like a Google Doc that lives in Drive. That could be something in your Internet. Right? There's all of the tools they would use for building out kind of internal documentation. There's lots of different ways that folks would accomplish that, but there's nothing about those tools that's designed specifically for that use case. Right? If you're thinking about

collaboration,

well, there's a lot of different ways that data teams collaborate with business stakeholders.

And a class of 1 that I talk about all the time is fulfilling requests. Every data person's favorite thing is to, like, fulfill favorite, I'm saying, is with the air quotes. Right? It's like accepting requests from business folks and then, like, delivering on them. Like, like, can you build me this new dashboard? And there's there's 2 main forms of this that I see right there. There's a Slack channel, data dash

marketing, data dash whatever function, and that serves as, like, a way for people to ask questions.

So that's 1 end of the spectrum. And then the other side of the spectrum is, like, you use a ticket, like, you use some type of ticketing or service desk systems. You, like, introduce Jira service desk into your environment.

2 very different

strategies

there would come with a number of different problems. 1

is really

fits into the existing, like, agile workflow of the data team. It's asynchronous

in nature,

and it allows kind of that team to fit that into their kind of existing work.

The other

is probably more collaborative, but it's interrupt driven and it's all synchronous. Right? So there's 2 different, like, sets of inherent problems or trade offs that teams have to make there.

So those are the other tools they would use for kind of stakeholder

collaboration.

And that was the last thing would be, like, just a good old, honest, like, meeting, which is like nobody's, like, who enjoys sitting in meetings all day long. Right?

And you're a high growth company, and people are joining all the time. You're joining meetings all of the time to train new people to use what has been built to them. Right?

And so

you can get

along doing that, but

it's not like a good use of anyone's time. Right? And these are incredibly,

like, smart,

quite frankly, well compensated

individuals,

really have better stuff to do.

And so why should they be focused on the road when you can have them be focused on the strategic and have them be focused on, like,

building data products. Right?

So

we're trying to help, like, help with that. And in that way, you can think of our product as being like a pretty unsexy product in a lot of in a lot of ways. Right? And we fully embrace that.

I think the reason why

a company would move off of kind of stitching together some of these workflows by repurposing tools to something dedicated like Workstream,

is that it's gotten to a point where

they're at the size and complexity that something like this

solves an acute enough pain point. Right?

And so that's normally when

people you can see

their interest level change, right, from being like, hey. This is like this is really interesting. I can see us using the freemium version of your product, like, here and there. We will allow this to actually be transformational for us.

Prefect is the data flow automation platform for the modern data stack, empowering data practitioners to build, run, and monitor robust pipelines at scale.

Guided by the principle that the orchestrator shouldn't get in your way, Prefect is the only tool of its kind to offer the flexibility to write workflows as code.

Prefect specializes in gluing together the disparate pieces of a pipeline

integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20, 000 community members, Prefect powers over 100, 000, 000 business critical tasks a month.

For more information on Prefect, go to dataengineeringpodcast.com/prefect

today. That's prefect.

Another interesting aspect of the question of working with data assets, particularly

given the number of different owners

of a particular piece of information and the different people who are interacting with it, is the question life

cycle

stages

that

are

most

often

overlooked

or,

life cycle stages that are most often overlooked or under maintained or completely ignored. I mean, 1 of the big problems we see

customers

having is just the concept

of data asset sprawl or dashboard sprawl.

And

this is even worse in an organization that's invested a lot in self-service analytics.

So if you're a organization that uses something like Looker as an example, like, you can have thousands and thousands of dashboards floating around that are out there.

So the cons the reason I bring that up is I think the concept

of life cycle management

is really interesting when you think about it as a potential solution to reining in something like dashboards. Well, right?

To me, the most underrated phases are

or at least maintained

are the initial stays of developing something, building something

new,

and specifically, like, the workflow

around that. And if you believe

what the output of a data team is a product, right, The data person is both the product manager, the actual engineer that's building it, and then the, like, customer support person who triages bugs and then fixes it. And so there's a lot that's put on that person.

So how does that data professional

live best practices from, like, a product management perspective? And a lot of that is about, like, listening to the customer,

working collaboratively to with the customer to design and build a solution that's gonna meet their needs.

And so where I think there's underappreciation

is just how difficult

the act of building is. Of course, that is difficult in many different ways, shapes, and forms, but the

interpersonal pieces of, like, product management, and it's just a lot for teams to

manage. So to me, I think that's

an area we need to spend more time on. And I think if we're scoping things out better and building better data products more collaboratively, you're gonna

need to build fewer over time.

I would then say the last piece is end of life. Right? And

I think teams spend a lot of time building

less time

taking things to end of life, and a lot of that's just because

you quite frankly don't have the time for it. And

when you've enabled self-service analytics, like, is it your job? So, like, maintaining all of the random stuff that, like, any citizen analyst in New York can create? I mean, that's a really interesting question. Right? Like, whose job is to govern?

There's lots of different ways that we've thought about that last piece, and that's really interesting and fascinating. Like, look, in our product, you can do everything as basic as, like,

mark something as, like, archived and expired.

Or you can say, hey. This thing isn't expiring assets. Like, the data is temporal,

and you shouldn't trust it after this date. And I'll, like, alert you and send you all these flags to, like, the end user that, hey. Don't trust this thing.

More broadly, like, you follow, like, the 80 20 rule. 20% of what's there is good and 80% is bad. So how do you discover what those 80% are? And so we think a lot about, well, how do we help teams triangulate

qualitative

information they get with, like, the quantitative, which is a lot more actionable. So, like, how do we help teams en masse take assets to end of life based off of what people are using? You know?

And

I think there's there's more work for us to do there as a product, but

I think that's an area that

really

has not been solved. And

I think the overall state

data environment

is this trend from order to entropy.

And that momentum is very hard to curtail.

And what it results in is that, like, yearly or, like, quarterly

project of, like, going in and, like, deleting stuff manually for, like, a week. Right?

Because the the problem has gotten so out of hand. And so how can we make that all happen

automatically? Right? Because, again,

the project of, like, deleting the reports and dashboards and whatever they are, like, that's not a good use of anyone's time.

In terms of the work stream

implementation details and some of the

types of integrations that you need to build and maintain for people to be able to get a broad enough coverage of their different systems and end user interfaces that they're trying to work with. So our influence is technically from an implementation perspective. We generally tie in at the API level to your, like, your various systems and tools. Right? The first category of that that we think about

is, like, the assets themselves. So they'll we'll connect in to your

your BI tool, your multiple solutions, and all of the various places

that you have data assets. And again, that could even be, like, operational systems where you're pushing data. Right? So you can connect us to those things, and then you can even

manually add those assets into our system as well, like, as a 1 off when you might be building it. And that's literally as simple as, like, you grab the link to the thing and you just register that asset.

And so that's kind of the kinda like the first step is,

like, helping teams

get their stuff into our repository.

I would say the next big category

of tools that we tie into

are

you know, we think of it, the next layer that we tie into less as being like the data warehouse itself. It's more the solutions already exist to do that that already tie into your data warehouse. And so a great example here is we have a dbt cloud integration. So you can connect us to your, like, dbt cloud project. We'll help you do a lot around triangulating

data quality and data freshness around all of those various assets, both informing you as well as stakeholders,

like, when there are underlying issues. And there's a there's a really interesting road map for us there around data observability

solutions.

And then we also tie into

any

of the kind of workflow and communication tools that you already use. That could be your messaging solution like Slack.

That could be your team's agile project management tool. And so

we can connect, as an example, with your Jira project. And if folks are spotting bugs in a dashboard,

they can start a conversation with you, and it will automatically create a ticket in your backlog. And the benefit of something like that is the collaboration is now all having in context with the live data, but you can prioritize the delivery of that work alongside all of your other work, and we'll sync your statuses and all that good stuff back and forth.

So the implementation

really starts with tying us in or connecting us rather with kind of

those tools they use. And if you go to our integrations page, workstream. Io/integrations,

you'll see that we already have 20 plus

integrations more coming in kind of every day. I would generally say that

the minimum required to get value

is generally like

2 connections. Right? So you connect us to your BI tool and dbt.

You're probably gonna be able to find value out of our solution that you wouldn't be able to. Or you connect us to your BI tool and your Jira projects. You know? That's also, like, the minimum viable

workspace or deployment for us. But the more and more that ends up in our system, the more that we can live up to that single point of get glass

vision and that, you know, analytics hub vision. That's a bunch around the, like, the technical implementation

details

from, like, an actual, like, implementation

of our solution and rolling it out. We work closely with customers. You know, they're self-service, and you can go play around with the product on your own. But we normally tell customers to start simple. Start small. Right?

So pick a a small pilot group of users

that you're experiencing some of these pain points around. Right?

And

just focus on

what are the top

data assets you have questions on all the time or you wanna onboard those users onto.

Add those into our system, build out some documentation, and then you're pretty much ready to go

to roll this thing out. The users and then kind of any of those others

throughout the organization,

can join that workspace and access what you've curated for them

literally just by

going to app.orgstream.io

and using, like, Google single sign on to get access and auto join your workspace.

So there's some training that's that's involved in just getting the initial deployment up to speed. We help folks with that or folks self-service

there. But once you're up and running, it's pretty seamless for others to to kinda get access.

In terms of the

kind of impact on teams of having this way to be able to

unify visibility

of the different assets that they're working with and the different data that's within the company.

As you mentioned, 1 of the things that you're hoping to do is free up a lot of

the engineers'

time from some of the toil that is involved with just being able to

share information about what data there is, you know, free up time in meetings no matter how many donuts there might be. And I'm wondering what you see as the kind of desirable

long range impact on the

behaviors and productivity

and capacity

for both data teams and the broader organization

as they get into the flow of using work stream and being able

to popularize that information without having to have as much manual involvement.

So I think putting Workstream aside

for a second, I think 1 of the really interesting questions

that gets asked is, like, how do you measure the successes?

And there's lots of different answers to that and lots of different ways that you could measure it and hot takes that are controversial.

1 that I actually, like, do believe is it's

how successful has your data team been around creating shared consciousness about your data within the organization?

And what is shared consciousness? Like, what what actually does that mean? Right? And it's an intentionally, like, fluffy and squishy term. Right? In its simplest form, it means that, like, everyone in your organization

has sufficient empathy of everyone else and understanding

of the data that they can

do there independently.

Right?

And

fundamentally in, like, a modern organization,

every decision,

every action that somebody takes really should be like informed by the data.

And

if it's not, it's probably a subpar action.

When everyone talks about being data driven, but, like, actually living to that standard is is very, very difficult.

And you're never gonna get there if you're living in some version

of a,

like, service model or a model where there's some level of power dynamics. Right? Where, like, 1 group has, like, all of the knowledge of and context of the data, and then there's this other group that

does stuff with the data. Right?

And you think about that, there's a very

in some it's a very transactional relationship, and that's just like the reality of a dynamic

and of that type of dynamic.

The best organizations

aren't a bunch of other folks looking at the data team.

It's the data team and everyone else all looking at the data together and discussing the data together. Right? And that's how you create shared consciousness

over time through the culture

of the business and how

it communicates with each other and and how it interacts. It's a shared space

where everyone is talking about this really, really valuable and important

asset

without judgment, without putting us out of power dynamics.

And there's probably, like,

2 organizations on the planet that have probably actually done this truly

at this point. But to me, I think, how do you invest

in creating that shared consciousness about your data which empowers everyone in the organization to act independently. Right?

And

when I think about our products, that's

fundamentally what we're trying to facilitate. Right? We're trying to create that

common

ground where

people can

see what questions were answered in the past. Right? And it's available to them right there at their fingertips

in context with the data itself.

It's a place that people can answer new questions. Right? It's a place that all of the interpersonal

work

got after the hard work

of collecting and transforming

and,

you know, analyzing data happens. Right? It's really that that we're trying to

help happen in a way that

more resembles like almost like a special forces SWAT team as opposed to,

you know, a factory where you've got a bunch of folks

on a line, like, turning rivets.

We think about

the ways that teams manage work

and tasks and workflow, be it like introducing like a service desk or introducing like agile project manager methodology. So there's nothing wrong with that. That's none of that is gonna go away,

but it's very much born of best practices

of, like, scientific management theory

and much less

thinking about fostering

culture of decision making.

As you have

been

exploring this problem area and developing your product and working with your customers, I'm wondering what are some of the

initial ideas

about the ways that this problem manifests that have been

challenged or updated

as you dug deeper into the space and as the surrounding ecosystem has evolved and gone in different directions?

So much to that question.

You start by building something that you have, like, this idea on, and you get feedback from people, and it takes you in a completely different direction.

What I would say underappreciated

when starting on our journey was I think it's how acute the

teams felt the asset management

challenges

of just maintaining

the sprawl of stuff that lives within the with, like, the modern organization. And this just, like,

this evolution from, like,

order to chaos or, like, entropy. Right?

We started

very much thinking about, well, how do we help speed up and facilitate some of these, like, workflows and collaboration and consumption of data?

And so that's as basic as like, hey, you've got this, like, asset, this dashboard, this report, whatever you what it is.

And, like, how can you facilitate a conversation that lives directly in context. Right?

And

people find that valuable,

But what's been really interesting is, well, like, how do you then,

like, get your organization to a place where it's gonna go do that. Right? Because there's other ways that teams manage that workflow today.

And so a lot of the

solutions we've offered today

around

managing the life cycle of your data assets either

on a 1 off basis or automatically,

that's really been all brought to us through customers.

A lot of the triangulation

of what's happening upstream with the data has been driven all with customers and

exposing

automatically,

like, issues

proactively

around data freshness or data quality

so that you don't have to answer the question like, hey, does this thing look right? Like, am I looking at the right data? We can help that business user answer that question for themselves. Right?

So that has been something I never thought we would dip our toes into, and it's been it's really awesome to watch

all of the the solutions kind of

in the transformation

and observability

space evolve because they're really, really interesting, integration points for us.

And I would say the last thing that

was not on my radar,

but it makes sense for us

is understanding more about

what folks are actually doing with your data products. Right? And if you think about, like, a traditional product or a SaaS product,

you'd have a whole host of tools out there

to understand,

like, what features people are using and, like, usage flows, right, and funnels.

And that can be, like, everything from Amplitude

or, like, Pando

to

something like Hotjar for click interactions

and those, like, a decent number of different

solutions

from your,

you know, your CEP to some of the tools I just mentioned that you've gotta, like, implement and maintain in order to make that work. But, like, how do you understand, like,

what filter on the dashboard is used all the time? Right? Like, there's just really no way to do that.

We ended up developing, like, a whole set of capabilities

that kinda offers what I'm describing that you would have for understanding the value of, like, a traditional product. But now you have that for your data products. Right? And you can see, like, hey. Like,

the users are always clicking here. Right? Or they're getting stuck there, or

these are the 5 reports that are used the most, and then you can understand exactly what the

head of department x is actually doing with that thing.

That's valuable,

I mean, more broadly as you try to manage all of your assets and do things like take them to end of life.

But it also can be, like, acutely helpful when someone's like, hey. I don't understand this thing. And you can then go in and actually see what that person was looking at and interacting with and not struggle with, like, I had no idea how to recreate the exact state that this person was in when they experienced this problem.

And so

to me, I think there's a lot more for us there.

It's been fun to have customers pull us in in some of these different direction. Yeah. I think that point of being able to have visibility

of what that end user interaction looks like for the data products that you're producing as somebody on the data team is definitely largely a missing piece where, to some extent, that's being addressed with some of the kind of metadata catalog and lineage solutions where you can see, okay, you know, what is the popularity

of a particular table? Where is it being used? But as you said, it gets more detailed than that where, you know, if you're in your BI platform and you say, okay, I've applied this filter, and now everything looks weird.

Somebody changed, you know, a particular pivot on the wrong axis or something,

then you say, oh, okay. Now I understand why you're having that problem. That's simply a very valuable point worth highlighting that that is an important piece of treating data as a product that has, to this point, largely been overlooked.

Of course. And not to say that,

understanding the popularity of a specific table isn't valuable. Of course, it's valuable, and there's a reason that you'd wanna understand that.

But from my perspective, that's a problem that's been solved and these other problems

have. Right? Now people are now solving it in much more elegant ways.

Right?

But when you think of data as a product, right,

you can't build and maintain good products if you don't have information about how they're bringing value to your end users. And the interactions that you were just describing around, like, a pivot or a filter

are, like, critically important. Right? And again, I think it goes down to, like, you know, like, I do believe data assets are products, and much of the work in the data team is like a product organization.

But, like, every analogy only goes so far.

Right? And, like, again, you think about Datawork

regardless of its size, it's some combination

of product work, engineering work, and customer support work. Right? And so when you think about, like, the product work

and the customer support work, like, what are the capabilities that you need in order to, like, fulfill those roles, like, well. Right?

And a lot of those are missing.

And so as we think about our solution set, it's about plugging in,

you know, a lot of those gaps.

On the support side, it's

how do we,

1, try to change the paradigm, shift things more towards this, like, touchy feely idea of, like, shared consciousness that I'm talking about, and in the process, speed up cycles and speed up workflow. Right? And then on the product work, it's

powering teams with the capabilities that a good product manager would want. Right?

A good product manager is gonna, like, know their customers and speak with them directly, and teams can and will continue to do that.

But a lot of the programmatic

quantitative information is just not available today.

Data engineers don't enjoy writing, maintaining and modifying E. T. L. Pipelines all day every day,

especially once they realize that 90% of all major data sources like Google Analytics, Salesforce, AdWords, Facebook, and spreadsheets are already available as plug and play connectors with reliable intuitive SaaS solutions.

HEVO Data is a highly reliable and intuitive data pipeline platform used by data engineers from over 40 countries to set up and run low latency ELT pipelines with 0 maintenance.

Boasting more than a 150 out of the box connectors that can be set up in minutes, Hivo also allows you to monitor and control your pipelines.

You get real time data flow visibility with fail safe mechanisms and alerts if anything breaks, preload transformations and auto schema mapping precisely control how data lands in your destination,

models and the workflows to transform data for analytics, and reverse ETL capability to move the transformed data back to your business software to inspire timely action.

All of this plus its transparent pricing and 247

live support makes it consistently voted by users as the leader in the data pipeline category on review platforms like g2. Go to dataengineeringpodcast.com/hevodata

today and sign up for a free 14 day trial that also comes with 247 support.

In your work of building Workstream

and working with teams, trying to help them get that holistic view of their data assets and data usage and the overall life cycle of those pieces of information? What are some of the most interesting, innovative, or unexpected ways that you've seen Workstream applied?

The first thing is just the

the stuff that people have asked us to support,

even just, like, thinking about an asset.

And

I've just talked with, like, best of breed teams who are just, like

because of very specific reasons, like, doing things that, like, you would claim are not, like, best of breed. Right? Like building,

like, 1 off docs with, like,

copy and paste the tables from, like, Looker, an example

that they go and present. There's reasons that they go ahead and do that. And so we've gotten a lot of pull to pull in, like, things like that as, like, a viable

asset type

or, like, data that's living within, like, an operational system, like your CRM or your, like, customer support system.

So I think that has been

has been interesting in a direction I, you know, I haven't seen. I would say 1 of the biggest innovative pieces, though, when I think of, like, actual use cases in our workflow

is

the stuff that data teams do themselves,

like, internally within our product. Right?

And using it

just for a lot of, like, internal, like, very quick hit, like, conversations.

Right? And seeing folks pull some of their workflows out of, like, horizontal communication systems into our tool has been has been really, really interesting, and a lot of that's around, like,

feedback loops when developing something new.

In your own work of building the platform and building the business, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

So so many.

I would say,

this is someone who's just, like, generally very impatient as a person that

building,

like, products

and changing

like, having folks change behavior

even when they know that, like, the existing, like, behavior is, like, broken in some way, shape, and form. It's just it takes a really, really long time.

And sort of to be patient.

That means a lot of different things, but it's especially true when building very technical products or workflow products that require folks to go from like, hey. I'm doing this thing every day in this way to doing it in a slightly different way. And that takes time not only to build, but to get right, and it's very nuanced. And for us, that means, like, we literally obsess about, like, every

interaction that you can take in our products. Right? Because that experience

is

is so so important. So I would say that's the first 1.

It takes a long time, and so the plan for it to take a long time and to be patient with your users and to be patient with your team to get the product where it needs to be.

That's the first thing. I would say the second thing is

there's a huge gap

between what

people say they're willing to do and then what they're actually willing to do. And that's especially important when you're building a new product. Right? And

it boils down to, you really have to not only be solving a big problem,

but your solution can't be incrementally better than what already exists, right?

It needs to be 10 times better. And that's when it becomes a no brainer

to

not only try something new, but adopt something new. And so

when someone says, you know, hey. This is a problem. This is interesting.

Like, warning bells to an extent because, like, it's a siren call that you're onto something, but you don't have to necessarily

come up with, a compelling enough solution

for that specific person.

And then maybe the last thing would be

try not to obsess too much about what other folks are doing, either folks who are potentially building something tangentially related to you,

especially if they're seeing some success. Right? Like, don't let that be a reason to, like, move off your vision. There's a lot of other reasons you should you should potentially stick with your vision. And try not to pay attention to, like, what more broadly is happening around you. Like, of course, we've lived through pretty

turbulent and in many ways traumatic times the last few years.

We're in a very interesting macroeconomic

environment right now with, like, the global, like, economic slowdown.

Those external factors matter, but

they're fundamentally much less important than than what you're actually seeing and, like, what your relationship is with your customers. Right? That really is the most important thing. Focus on that, and then that becomes the guide for how to navigate the world around you.

For people who are interested in being able to get more holistic visibility of their data assets and be able to maintain that communication and context about them? What are the cases where Workstream is the wrong choice?

I generally say,

if you're a small organization,

we're generally not gonna be a good fit for you. Like,

if you are a 30 person organization and you have a single person on your data team,

like, feel free to try out my product. There's a free version of our product. Go ahead and play around with it, but we're generally not the best fit. We're a better fit for, like, the larger, more complex organizations

with bigger teams that are very data driven

and are on the path towards, like, this state of entropy that we've been talking about. Right?

And

we care about everyone, but those are the ones that we're vested

for. I would also say, like, you generally need to be an organization that has truly embraced, like, the latest and greatest, like,

modern data technology and the modern data stack. Right?

And within that, we're generally not the, like, first tool that you're gonna use. Right? We're like a year 2 type of solution that you'd introduce into your environment.

Right? You're if, hypothetically, if you're, like, setting up a new data stack, the year 1 is, like, setting up the data stack, and then years 2 through whatever is about

managing it and maintaining it and empowering the org. And so folks are normally a little bit further along in their journey.

It's less

the, hey. This thing is getting rolled out alongside

a brand new deployment

of Looker or the

BI solution of choice.

And as you continue to build and iterate on the product and keep an eye towards what is coming down the pike in terms of the data ecosystem and the ways that people are using their data? What are some of the things you have planned for the near to medium term or any particular

problem areas that you're excited to dig into?

Yeah. And I think in general, 1 of my big

thesis and hypotheses, and I think you're gonna already see it happening, is that we are increasingly headed towards the heterogeneous

environment and where more and more

more and more tools are gonna be used to, like, analyze and consume data. Right? And we're going for more of a monolithic solution, 1 size fits all to the hey. We have all of these different jobs to be done in Linux,

and we're gonna leverage, like, the best in breed solutions

for each 1 of those. Right? And maybe there is something you're using for exploratory analysis, like a minor notebook.

Maybe you're using a BI solution for dashboarding.

Maybe you're using something to manage your metrics centrally.

You're using something else to push data out into other systems. And so I think we're gonna continue to see

data proliferating

in all these different places.

And so

as that continues to evolve, we'll evolve along sided. And generally, we want to be agnostic to what teams choose. And so we wanna be able to support anything and everything under the sun that data teams embrace to analyze and consume data. And then I would say the last thing is we're we're we're doubling out a lot, especially in the next year on

some of what we talk around kind of usage analytics of your data assets and and really understanding

more and more

about

how end users are embracing what's been built for them.

There's a lot that we already do there today. It it feels like a big missing part of the data team toolset, and we're excited to

see customers embrace that more and to make what we already offer today more powerful.

Are there any other aspects of this particular problem area of being able to gain cross cutting visibility of data and its usage within an organization or the work that you're doing at Workstream to support that that we didn't discuss yet that you would like to cover before we close out the show?

If I could leave the listener with 1 thing on how you should think about us,

we care about

fundamentally, like, people and

what they do with your data and how we empower everyone to do that better.

And so we really think of our product as a like a collaboration

product at its core,

and

that's what makes us excited. And it's about bringing people together, and it's about bringing people together

about their data, which is even better than that. So that's what

kinda gets me excited about

talking to customers every day and seeing how they're embracing our solution.

Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

Yeah. So I think and this is tangentially related to, like, our problem area. But 1 of the things that I think is a huge gap when you think about, like, machine learning applications

is I really think no 1 has

when you think about, like, the operational aspects of data,

there's really no machine learning applications

around that. And by that, I mean,

how do we, like, programmatically,

like, let people know what's being said about the data, how it's been used in the past. Right? And so I think,

in the next 10 years, we're gonna see some some really interesting,

I think, evolutions in that space,

which I think are gonna kind of completely

form the way we interact and relate to

to our data and what people are doing with it.

We have something we call the data concierge, but, like, what I'm talking about is, like, an actual, like, automated concierge that can, like, automatically, like, give you answers about your data.

And

I'm excited to see

who finally is able to bring

that to market

successfully

because I think of anything that's gonna be, like, the biggest game changer

in not

what data we have available, but then in our ability to go ahead and action it.

Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing at Workstream

and your vision of being able to provide a unified context for data assets within an organization and how they're being used. So appreciate all of the time and energy that you and your team are putting into making that a reality, and I hope you enjoy the rest of your day. Thanks. You as well. Thank you for having me, and enjoy the rest of your day.

Thank you for listening.

Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the machine learning podcast,

which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com

to subscribe to the show, sign up for the mailing list, and read the And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at dataengineeringpodcast.com with your story.

And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links