Summary
There is a constant tension in business data between growing siloes, and breaking them down. Even when a tool is designed to integrate information as a guard against data isolation, it can easily become a silo of its own, where you have to make a point of using it to seek out information. In order to help distribute critical context about data assets and their status into the locations where work is being done Nicholas Freund co-founded Workstream. In this episode he discusses the challenge of maintaining shared visibility and understanding of data work across the various stakeholders and his efforts to make it a seamless experience.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos.
- Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect.
- Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to dataengineeringpodcast.com/hevodata and sign up for a free 14-day trial that also comes with 24×7 support.
- Your host is Tobias Macey and today I’m interviewing Nicholas Freund about Workstream, a platform aimed at providing a single pane of glass for analytics in your organization
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Workstream is and the story behind it?
- What is the core problem that you are trying to solve at Workstream?
- How does that problem manifest for the different stakeholders in an organization?
- What are the contributing factors that lead to fragmentation of visibility for data workflows at different stages?
- What are the sources of information that you use to build a cohesive view of an organization’s data assets?
- What are the lifecycle stages of a data asset that are most often overlooked or un-maintained?
- What are the risks and challenges associated with retirement of a data asset?
- Can you describe how Workstream is implemented?
- How have the design and goals of the system changed since you first started it?
- What does the day-to-day interaction with workstream look like for different roles in a company?
- What are the long-range impacts on team behaviors/productivity/capacity that you hope to catalyze?
- What are the most interesting, innovative, or unexpected ways that you have seen Workstream used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Workstream?
- When is Workstream the wrong choice?
- What do you have planned for the future of Workstream?
Contact Info
- @nickfreund on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Atlin is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlin's active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans could focus on delivering real value. Go to data engineering podcast.com/atlan today, that's a t l a n, to learn more about how Atlas Active Metadata platform is helping pioneering data teams like Postman, Plaid, WeWork, and Unilever achieve extraordinary things with metadata.
When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With their new managed database service, you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy, and today I'm interviewing Nicholas Freund about Workstream, a platform aimed at providing a single pane of glass for analytics in your organization. So, Nicholas, can you start by introducing yourself? Thanks for having me on the show. I'm Nick Freund. I'm the founder and CEO of Workstream. Io. I started the company really born of
[00:01:47] Unknown:
so many pain points I had experienced in my career, first as an analytics person and then later an operator working really closely with our data function.
[00:01:55] Unknown:
And do you remember how you first got started working in data?
[00:01:58] Unknown:
Yeah. I mean, it really started at the very beginning of my career. I joined as 1 of the first analysts at Tesla, called Tesla Motors. Back then, about 15 years ago, very early in the Tesla journey, I joined when there was 200 people at the company pre delivery on the original Roadster. Anyway, I was there for a long time supporting the operations and manufacturing teams, bringing the Tesla Roadster and Model S to market. And then more recently, I ran operations at a SaaS here in New York where I built lots of different functions and collaborated with and worked really closely with our data team and our business operations team. And we're just a very, very data driven culture, and we experienced lots of the same problems I had issues with around managing our analytics assets at Tesla.
Yeah. That's really the germ of what has become our company.
[00:02:52] Unknown:
And so can you describe a bit more about what the Workstream product is and some of the story behind how you decided to invest in actually building a business around it and why this is the problem area that you wanted to spend your time and energy on? 1 way to think about our product is a single pane of glass for your analytics assets. We also frame it as the analytics hub. It really is a place
[00:03:14] Unknown:
where teams can bring together disparate data and analytics assets into a single unified repository. I mean, that includes, like, not only the the assets themselves, but, like, all of the important and critical business context around it, like documentation and training content that becomes really useful for all the other folks throughout the organization to be able to operationalize your assets. And so it serves as that kind of centralized single pane of glass access layer. But then it also then facilitates, collaboration and workflows all directly in context of your data.
So that's a little bit about our product. But, yeah, like, as I mentioned quickly before, it's really born of the issues I had experienced. Right? And I think the initial catalyst for me was some very acute pain points around my workflow as an operator working with our data team. And, you know, as simple as I think everyone's experience, someone sends you, like, a screenshot of a dashboard in Slack, and you're going back and forth and having a conversation that insights and providing feedback, and it was the very disjointed, painful, and manual workflow.
But I'd also experienced all of these problems around finding analytics assets that either I had created or others had created and supplied to me, and you can never find what you were looking for. And even more importantly, you couldn't remember, like, where were we when we were looking at this last? And, like, what were we doing with the data, which it's like those pieces of not just what the data is, but what is the business doing with it? That is personally what really fascinates me. And, you know, I felt at the time, and I still do, that there had been relatively little investments by companies or, by products in kind of solving those last mile issues.
And I felt like there was a real opportunity to kind of solve those key pain points for data teams and, honestly, everyone in the organization who works closely with your data. And that was really how I got passionate about just feeling these problems myself. And I met with other entrepreneurs who had done similar things, really not in the data and analytics space, but really thinking about specific work workflows for specific personas within organizations and and how you could facilitate workflows kind of around the existing productivity tools that they already use, of which, of course, there are so many that data people use from your warehouse to whatever you're using for transformations or data pipelines, etcetera, etcetera. And so I thought no 1 has really looked at how do we wrap or extend all of these really powerful technical tools, kind of the business user, and build something that's truly integrated and dynamic. So, yeah, that's a quick early story of how I came upon the problem area and then why I decided to dedicate the next phase of my career to solve some of those problems.
[00:06:17] Unknown:
In that context of being able to build a single repository of information about all of the different data assets that you have and information about them, from that framing, it sounds similar to some of the different sort of metadata catalog, data discovery platforms that are out there. And I'm wondering what you see as the missing piece of that approach that leaves out the business users and stakeholders and some of the ways that your work at Workstream is either a different approach to that or is maybe a step above that kind of metadata catalog, data discovery repository.
[00:06:58] Unknown:
With very knowledgeable folks like yourself, like, 1 of the first questions I get is, like, oh, is this a data catalog? Right? And the short answer is no. It's not. You know, know what data catalogs are. Talked to many, many folks who use them. We have customers who also use data catalogs, and it's a big problem. And if it's right, we're trying to build something different. And, fundamentally, the first way to think about it is our perspective really is about is from the data team to the business and then the business back to the data team. Right? And some of the things that are unique in kind of what we're doing, as an example, we don't map all of the tables in your data warehouse and help you build out documentation around the columns in a table, as an example. Like, there are reasons that you would wanna go ahead and do stuff like that. But, fundamentally, those are for people who can write SQL. Right? And can, like, build analytics themselves. And so it's very much, from our perspective, a product that's built for, really, the technical users or, like, citizen analysts throughout the business.
Salespeople generally aren't gonna have access to your data catalog. Right? Customer success managers or product managers might, but frontline business folks aren't. They're normally gonna see the output of what the analytics team creates, like the actual data products themselves. So, really, our integration layer is all around the data products. Right? And that can be everything from your BI solution and all the dashboards and reports that exist with it or the multiple BI solutions. We're very pragmatic that not everything is done to best practice. Right? So, like, what about all those random spreadsheets that are, like, floating around throughout your organization? Right? Or what about the complex recommendation and insights that might just live in a document? Right? We treat that as a first class asset as well. What about operational data that's getting pushed into various SaaS applications? All of these are what we would call an asset in our system.
And so what we do is really provide a unified repository at that layer, and then we pick up and help facilitate workflows from there to the business. Right? And so that could be, for example, a training video. So all of the customer success managers know how to use the customer 360 dashboard and all the other data assets that are available to them. And what's unique about our product is it's all directly integrated with the tools that teams are to use. And so, for example, if you are then in Salesforce looking at a live, you know, data that's been pushed into an account or even some analytics that might have been built natively within Salesforce, our concierge called the data concierge, it brings critical documentation and context into the consumption layer alongside the tools that teams already use.
So repository is there as an access layer for more mature organizations and almost think of it as a single drive for business users to go ahead and access this stuff. But then they can still engage with the data in the system that they already use, and we're really then augmenting that from a workflow perspective. So those are some of the ways that we think that we're different, and I think while the way I'll wrap it up is we look at our deployments. Right? We land with, like, very, very technical users normally, like the data teams, the analytics teams. But the majority of our users are nontechnical users. Right? You end up with the deployment, and there are 100 of folks who, like, have never written, like, a line of code in their entire life. And so it becomes very much this kind of internal network around
[00:10:38] Unknown:
accessing data and collaborating on it. As far as those interactions to your point, you know, it's the technical users who bring in the product and then the I don't know if nontechnical is necessarily the most appropriate term, but the people who don't have technology as their core focus are the ones who are going to be interacting with it more predominantly. And I'm wondering what are some of the types of interfaces that you're integrating with to be able to provide that interaction and the types of information that those end users are looking for when they decide that, oh, Workstream is the right solution for me because I just wanna know, you know, did this table get updated? Is this spreadsheet using the most current version of our sales figures? Whatever. You know, just curious if you can talk to some of that kind of user interaction and the types of information that they're looking to get at and understand as they're doing their day to day job and how you're providing that at Workstream.
[00:11:33] Unknown:
Yeah. What's really fascinating to me is just, like, we are seeing more and more about how customers are using a product in ways that we best of we even expect when we built it, despite nature of it. We get teams really engaged is when it dawns on them that, hey, this is, like, a single place for us to go have access to all of our, like, analytics. Right? And then I have to, like, shuffle between tabs, and that can be, like, they have to go to the little drive for this 1 thing, and they've gotta go to the native directory within Tableau.
That's how they're right? Or there's data that lives in Salesforce and bookmarking that within their browser. So we're kind of displacing these behaviors of, like, where is all of this stuff, and it's fragmented on their process zones. And so it creates kind of lots of those pain points. And so that all now becomes kind of consolidated in a single place. With regards to, like, the interfaces themselves, and this is, like, tangentially relevant to the audience, but, like, what's interesting about our product is it's actually incredibly web development heavy. And so a lot of the complexity of how our product is implemented from the hood is, like, how do we interoperate across all of these user interfaces of, like, different tools?
We have a web app. And so, again, the thing about this is drive that you go to to, like, access contact and documentation. So and you can find things via library. You can search for all of the the things that you would expect. Teams can build out collections of assets for various teams and end users. And there's some things that you can go ahead and also then view within our product, and there's lots of, analytics assets that are designed in this way. Not specifically for this use case, but, like, you can view a dashboard in in a web app. But in a lot of cases, that's vast majority of them. That's not how things work. And so in that case, we'll kind of send you back out to the actual source of Trueplay. It's the actual data tool, and then we have our Chrome extension that lives alongside. They basically bring us our experience kind of into that native interface.
And so it's kind of complicated under the hood from a development perspective. It comes off as pretty seamless kind of to the to the end users. Now I think from specific use cases, what we're seeing a lot, right now are data team's finding success using this to enable kind of more complex go to market teams, larger go to market teams. Now you've got an organization with 100 of folks, supporting customers or 100 of folks talking out of sales and marketing capacity, and
[00:14:15] Unknown:
there's just a lot of manual back and forth or, like, training meeting teams have to set up to teach folks on how to get up to speed and use what's been created for them. And we can streamline all of that and save data teams as well as everyone else lots of cycles. And, yeah, I generally try not to, to your point, refer them as nontechnical folks. Stakeholders of the data team come in all in, like, shapes and sizes and forms. Right? I think it's just like the others. Right? The others in the organization whom you work with. Right? And for us, for our case right now, you know, those are often kind of these types of folks. And the use cases start with, consuming curated knowledge, but then it then extends to collaboration.
And that could be everything from, hey. Like, I have a question on this, like, 1 specific thing that I'm seeing, and that conversation all happened directly in the context. Right? And teams can extend that with, like, rich annotations and, drawing on top of what they're seeing. They can include a video content, a more complex explanation. And those workflows, they kinda live across the life cycle of your analytics assets. Right? So it's everything from something that you're building brand new on is in development, and you're working collaboratively on extended all the way to a different life as well. Or as
[00:15:39] Unknown:
the kind of flow of people's work and the ways that they are interacting with the different data and trying to get insights about the underlying assets. For people who aren't using something like Workstream, I'm wondering what you see as some of the main points of friction and the sources of fragmentation in the availability of that information or some of the challenges that they have to overcome in being able to gain that same level of insight in their work of just being able to work with the data, understand the
[00:16:10] Unknown:
context of the data, sort of the freshness, quality, things like that. You know, there are ways of trying to solve some of these problems that we help you solve. Right? And the easiest way to think about the ways teams would solve that is it would introduce kind of new tools, or they would introduce tools into their environment, or they would repurpose tools to accomplish this. Right? So if you're thinking about documentation, there's a lot of different ways that you could build out business facing documentation. That could be like a doc, like a Google Doc that lives in Drive. That could be something in your Internet. Right? There's all of the tools they would use for building out kind of internal documentation. There's lots of different ways that folks would accomplish that, but there's nothing about those tools that's designed specifically for that use case. Right? If you're thinking about collaboration, well, there's a lot of different ways that data teams collaborate with business stakeholders.
And a class of 1 that I talk about all the time is fulfilling requests. Every data person's favorite thing is to, like, fulfill favorite, I'm saying, is with the air quotes. Right? It's like accepting requests from business folks and then, like, delivering on them. Like, like, can you build me this new dashboard? And there's there's 2 main forms of this that I see right there. There's a Slack channel, data dash marketing, data dash whatever function, and that serves as, like, a way for people to ask questions. So that's 1 end of the spectrum. And then the other side of the spectrum is, like, you use a ticket, like, you use some type of ticketing or service desk systems. You, like, introduce Jira service desk into your environment.
2 very different strategies there would come with a number of different problems. 1 is really fits into the existing, like, agile workflow of the data team. It's asynchronous in nature, and it allows kind of that team to fit that into their kind of existing work. The other is probably more collaborative, but it's interrupt driven and it's all synchronous. Right? So there's 2 different, like, sets of inherent problems or trade offs that teams have to make there. So those are the other tools they would use for kind of stakeholder collaboration.
And that was the last thing would be, like, just a good old, honest, like, meeting, which is like nobody's, like, who enjoys sitting in meetings all day long. Right? And you're a high growth company, and people are joining all the time. You're joining meetings all of the time to train new people to use what has been built to them. Right? And so you can get along doing that, but it's not like a good use of anyone's time. Right? And these are incredibly, like, smart, quite frankly, well compensated individuals, really have better stuff to do.
And so why should they be focused on the road when you can have them be focused on the strategic and have them be focused on, like, building data products. Right? So we're trying to help, like, help with that. And in that way, you can think of our product as being like a pretty unsexy product in a lot of in a lot of ways. Right? And we fully embrace that. I think the reason why a company would move off of kind of stitching together some of these workflows by repurposing tools to something dedicated like Workstream, is that it's gotten to a point where they're at the size and complexity that something like this solves an acute enough pain point. Right?
And so that's normally when people you can see their interest level change, right, from being like, hey. This is like this is really interesting. I can see us using the freemium version of your product, like, here and there. We will allow this to actually be transformational for us.
[00:19:57] Unknown:
Prefect is the data flow automation platform for the modern data stack, empowering data practitioners to build, run, and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn't get in your way, Prefect is the only tool of its kind to offer the flexibility to write workflows as code. Prefect specializes in gluing together the disparate pieces of a pipeline integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20, 000 community members, Prefect powers over 100, 000, 000 business critical tasks a month. For more information on Prefect, go to dataengineeringpodcast.com/prefect today. That's prefect.
Another interesting aspect of the question of working with data assets, particularly given the number of different owners of a particular piece of information and the different people who are interacting with it, is the question life cycle stages that are most often overlooked or, life cycle stages that are most often overlooked or under maintained or completely ignored. I mean, 1 of the big problems we see
[00:21:13] Unknown:
customers having is just the concept of data asset sprawl or dashboard sprawl. And this is even worse in an organization that's invested a lot in self-service analytics. So if you're a organization that uses something like Looker as an example, like, you can have thousands and thousands of dashboards floating around that are out there. So the cons the reason I bring that up is I think the concept of life cycle management is really interesting when you think about it as a potential solution to reining in something like dashboards. Well, right? To me, the most underrated phases are or at least maintained are the initial stays of developing something, building something new, and specifically, like, the workflow around that. And if you believe what the output of a data team is a product, right, The data person is both the product manager, the actual engineer that's building it, and then the, like, customer support person who triages bugs and then fixes it. And so there's a lot that's put on that person.
So how does that data professional live best practices from, like, a product management perspective? And a lot of that is about, like, listening to the customer, working collaboratively to with the customer to design and build a solution that's gonna meet their needs. And so where I think there's underappreciation is just how difficult the act of building is. Of course, that is difficult in many different ways, shapes, and forms, but the interpersonal pieces of, like, product management, and it's just a lot for teams to manage. So to me, I think that's an area we need to spend more time on. And I think if we're scoping things out better and building better data products more collaboratively, you're gonna need to build fewer over time.
I would then say the last piece is end of life. Right? And I think teams spend a lot of time building less time taking things to end of life, and a lot of that's just because you quite frankly don't have the time for it. And when you've enabled self-service analytics, like, is it your job? So, like, maintaining all of the random stuff that, like, any citizen analyst in New York can create? I mean, that's a really interesting question. Right? Like, whose job is to govern? There's lots of different ways that we've thought about that last piece, and that's really interesting and fascinating. Like, look, in our product, you can do everything as basic as, like, mark something as, like, archived and expired.
Or you can say, hey. This thing isn't expiring assets. Like, the data is temporal, and you shouldn't trust it after this date. And I'll, like, alert you and send you all these flags to, like, the end user that, hey. Don't trust this thing. More broadly, like, you follow, like, the 80 20 rule. 20% of what's there is good and 80% is bad. So how do you discover what those 80% are? And so we think a lot about, well, how do we help teams triangulate qualitative information they get with, like, the quantitative, which is a lot more actionable. So, like, how do we help teams en masse take assets to end of life based off of what people are using? You know? And I think there's there's more work for us to do there as a product, but I think that's an area that really has not been solved. And I think the overall state data environment is this trend from order to entropy.
And that momentum is very hard to curtail. And what it results in is that, like, yearly or, like, quarterly project of, like, going in and, like, deleting stuff manually for, like, a week. Right? Because the the problem has gotten so out of hand. And so how can we make that all happen automatically? Right? Because, again, the project of, like, deleting the reports and dashboards and whatever they are, like, that's not a good use of anyone's time.
[00:25:13] Unknown:
In terms of the work stream implementation details and some of the
[00:25:19] Unknown:
types of integrations that you need to build and maintain for people to be able to get a broad enough coverage of their different systems and end user interfaces that they're trying to work with. So our influence is technically from an implementation perspective. We generally tie in at the API level to your, like, your various systems and tools. Right? The first category of that that we think about is, like, the assets themselves. So they'll we'll connect in to your your BI tool, your multiple solutions, and all of the various places that you have data assets. And again, that could even be, like, operational systems where you're pushing data. Right? So you can connect us to those things, and then you can even manually add those assets into our system as well, like, as a 1 off when you might be building it. And that's literally as simple as, like, you grab the link to the thing and you just register that asset.
And so that's kind of the kinda like the first step is, like, helping teams get their stuff into our repository. I would say the next big category of tools that we tie into are you know, we think of it, the next layer that we tie into less as being like the data warehouse itself. It's more the solutions already exist to do that that already tie into your data warehouse. And so a great example here is we have a dbt cloud integration. So you can connect us to your, like, dbt cloud project. We'll help you do a lot around triangulating data quality and data freshness around all of those various assets, both informing you as well as stakeholders, like, when there are underlying issues. And there's a there's a really interesting road map for us there around data observability solutions.
And then we also tie into any of the kind of workflow and communication tools that you already use. That could be your messaging solution like Slack. That could be your team's agile project management tool. And so we can connect, as an example, with your Jira project. And if folks are spotting bugs in a dashboard, they can start a conversation with you, and it will automatically create a ticket in your backlog. And the benefit of something like that is the collaboration is now all having in context with the live data, but you can prioritize the delivery of that work alongside all of your other work, and we'll sync your statuses and all that good stuff back and forth. So the implementation really starts with tying us in or connecting us rather with kind of those tools they use. And if you go to our integrations page, workstream. Io/integrations, you'll see that we already have 20 plus integrations more coming in kind of every day. I would generally say that the minimum required to get value is generally like 2 connections. Right? So you connect us to your BI tool and dbt.
You're probably gonna be able to find value out of our solution that you wouldn't be able to. Or you connect us to your BI tool and your Jira projects. You know? That's also, like, the minimum viable workspace or deployment for us. But the more and more that ends up in our system, the more that we can live up to that single point of get glass vision and that, you know, analytics hub vision. That's a bunch around the, like, the technical implementation details from, like, an actual, like, implementation of our solution and rolling it out. We work closely with customers. You know, they're self-service, and you can go play around with the product on your own. But we normally tell customers to start simple. Start small. Right?
So pick a a small pilot group of users that you're experiencing some of these pain points around. Right? And just focus on what are the top data assets you have questions on all the time or you wanna onboard those users onto. Add those into our system, build out some documentation, and then you're pretty much ready to go to roll this thing out. The users and then kind of any of those others throughout the organization, can join that workspace and access what you've curated for them literally just by going to app.orgstream.io and using, like, Google single sign on to get access and auto join your workspace.
So there's some training that's that's involved in just getting the initial deployment up to speed. We help folks with that or folks self-service there. But once you're up and running, it's pretty seamless for others to to kinda get access.
[00:29:44] Unknown:
In terms of the kind of impact on teams of having this way to be able to unify visibility of the different assets that they're working with and the different data that's within the company. As you mentioned, 1 of the things that you're hoping to do is free up a lot of the engineers' time from some of the toil that is involved with just being able to share information about what data there is, you know, free up time in meetings no matter how many donuts there might be. And I'm wondering what you see as the kind of desirable long range impact on the behaviors and productivity and capacity for both data teams and the broader organization as they get into the flow of using work stream and being able to popularize that information without having to have as much manual involvement.
[00:30:33] Unknown:
So I think putting Workstream aside for a second, I think 1 of the really interesting questions that gets asked is, like, how do you measure the successes? And there's lots of different answers to that and lots of different ways that you could measure it and hot takes that are controversial. 1 that I actually, like, do believe is it's how successful has your data team been around creating shared consciousness about your data within the organization? And what is shared consciousness? Like, what what actually does that mean? Right? And it's an intentionally, like, fluffy and squishy term. Right? In its simplest form, it means that, like, everyone in your organization has sufficient empathy of everyone else and understanding of the data that they can do there independently.
Right? And fundamentally in, like, a modern organization, every decision, every action that somebody takes really should be like informed by the data. And if it's not, it's probably a subpar action. When everyone talks about being data driven, but, like, actually living to that standard is is very, very difficult. And you're never gonna get there if you're living in some version of a, like, service model or a model where there's some level of power dynamics. Right? Where, like, 1 group has, like, all of the knowledge of and context of the data, and then there's this other group that does stuff with the data. Right?
And you think about that, there's a very in some it's a very transactional relationship, and that's just like the reality of a dynamic and of that type of dynamic. The best organizations aren't a bunch of other folks looking at the data team. It's the data team and everyone else all looking at the data together and discussing the data together. Right? And that's how you create shared consciousness over time through the culture of the business and how it communicates with each other and and how it interacts. It's a shared space where everyone is talking about this really, really valuable and important asset without judgment, without putting us out of power dynamics.
And there's probably, like, 2 organizations on the planet that have probably actually done this truly at this point. But to me, I think, how do you invest in creating that shared consciousness about your data which empowers everyone in the organization to act independently. Right? And when I think about our products, that's fundamentally what we're trying to facilitate. Right? We're trying to create that common ground where people can see what questions were answered in the past. Right? And it's available to them right there at their fingertips in context with the data itself.
It's a place that people can answer new questions. Right? It's a place that all of the interpersonal work got after the hard work of collecting and transforming and, you know, analyzing data happens. Right? It's really that that we're trying to help happen in a way that more resembles like almost like a special forces SWAT team as opposed to, you know, a factory where you've got a bunch of folks on a line, like, turning rivets. We think about the ways that teams manage work and tasks and workflow, be it like introducing like a service desk or introducing like agile project manager methodology. So there's nothing wrong with that. That's none of that is gonna go away, but it's very much born of best practices of, like, scientific management theory and much less thinking about fostering culture of decision making.
[00:34:17] Unknown:
As you have been exploring this problem area and developing your product and working with your customers, I'm wondering what are some of the initial ideas about the ways that this problem manifests that have been challenged or updated as you dug deeper into the space and as the surrounding ecosystem has evolved and gone in different directions?
[00:34:41] Unknown:
So much to that question. You start by building something that you have, like, this idea on, and you get feedback from people, and it takes you in a completely different direction. What I would say underappreciated when starting on our journey was I think it's how acute the teams felt the asset management challenges of just maintaining the sprawl of stuff that lives within the with, like, the modern organization. And this just, like, this evolution from, like, order to chaos or, like, entropy. Right? We started very much thinking about, well, how do we help speed up and facilitate some of these, like, workflows and collaboration and consumption of data?
And so that's as basic as like, hey, you've got this, like, asset, this dashboard, this report, whatever you what it is. And, like, how can you facilitate a conversation that lives directly in context. Right? And people find that valuable, But what's been really interesting is, well, like, how do you then, like, get your organization to a place where it's gonna go do that. Right? Because there's other ways that teams manage that workflow today. And so a lot of the solutions we've offered today around managing the life cycle of your data assets either on a 1 off basis or automatically, that's really been all brought to us through customers.
A lot of the triangulation of what's happening upstream with the data has been driven all with customers and exposing automatically, like, issues proactively around data freshness or data quality so that you don't have to answer the question like, hey, does this thing look right? Like, am I looking at the right data? We can help that business user answer that question for themselves. Right? So that has been something I never thought we would dip our toes into, and it's been it's really awesome to watch all of the the solutions kind of in the transformation and observability space evolve because they're really, really interesting, integration points for us.
And I would say the last thing that was not on my radar, but it makes sense for us is understanding more about what folks are actually doing with your data products. Right? And if you think about, like, a traditional product or a SaaS product, you'd have a whole host of tools out there to understand, like, what features people are using and, like, usage flows, right, and funnels. And that can be, like, everything from Amplitude or, like, Pando to something like Hotjar for click interactions and those, like, a decent number of different solutions from your, you know, your CEP to some of the tools I just mentioned that you've gotta, like, implement and maintain in order to make that work. But, like, how do you understand, like, what filter on the dashboard is used all the time? Right? Like, there's just really no way to do that.
We ended up developing, like, a whole set of capabilities that kinda offers what I'm describing that you would have for understanding the value of, like, a traditional product. But now you have that for your data products. Right? And you can see, like, hey. Like, the users are always clicking here. Right? Or they're getting stuck there, or these are the 5 reports that are used the most, and then you can understand exactly what the head of department x is actually doing with that thing. That's valuable, I mean, more broadly as you try to manage all of your assets and do things like take them to end of life.
But it also can be, like, acutely helpful when someone's like, hey. I don't understand this thing. And you can then go in and actually see what that person was looking at and interacting with and not struggle with, like, I had no idea how to recreate the exact state that this person was in when they experienced this problem. And so to me, I think there's a lot more for us there. It's been fun to have customers pull us in in some of these different direction. Yeah. I think that point of being able to have visibility
[00:38:58] Unknown:
of what that end user interaction looks like for the data products that you're producing as somebody on the data team is definitely largely a missing piece where, to some extent, that's being addressed with some of the kind of metadata catalog and lineage solutions where you can see, okay, you know, what is the popularity of a particular table? Where is it being used? But as you said, it gets more detailed than that where, you know, if you're in your BI platform and you say, okay, I've applied this filter, and now everything looks weird. Somebody changed, you know, a particular pivot on the wrong axis or something, then you say, oh, okay. Now I understand why you're having that problem. That's simply a very valuable point worth highlighting that that is an important piece of treating data as a product that has, to this point, largely been overlooked.
[00:39:46] Unknown:
Of course. And not to say that, understanding the popularity of a specific table isn't valuable. Of course, it's valuable, and there's a reason that you'd wanna understand that. But from my perspective, that's a problem that's been solved and these other problems have. Right? Now people are now solving it in much more elegant ways. Right? But when you think of data as a product, right, you can't build and maintain good products if you don't have information about how they're bringing value to your end users. And the interactions that you were just describing around, like, a pivot or a filter are, like, critically important. Right? And again, I think it goes down to, like, you know, like, I do believe data assets are products, and much of the work in the data team is like a product organization.
But, like, every analogy only goes so far. Right? And, like, again, you think about Datawork regardless of its size, it's some combination of product work, engineering work, and customer support work. Right? And so when you think about, like, the product work and the customer support work, like, what are the capabilities that you need in order to, like, fulfill those roles, like, well. Right? And a lot of those are missing. And so as we think about our solution set, it's about plugging in, you know, a lot of those gaps. On the support side, it's how do we, 1, try to change the paradigm, shift things more towards this, like, touchy feely idea of, like, shared consciousness that I'm talking about, and in the process, speed up cycles and speed up workflow. Right? And then on the product work, it's powering teams with the capabilities that a good product manager would want. Right?
A good product manager is gonna, like, know their customers and speak with them directly, and teams can and will continue to do that. But a lot of the programmatic quantitative information is just not available today.
[00:41:42] Unknown:
Data engineers don't enjoy writing, maintaining and modifying E. T. L. Pipelines all day every day, especially once they realize that 90% of all major data sources like Google Analytics, Salesforce, AdWords, Facebook, and spreadsheets are already available as plug and play connectors with reliable intuitive SaaS solutions. HEVO Data is a highly reliable and intuitive data pipeline platform used by data engineers from over 40 countries to set up and run low latency ELT pipelines with 0 maintenance. Boasting more than a 150 out of the box connectors that can be set up in minutes, Hivo also allows you to monitor and control your pipelines.
You get real time data flow visibility with fail safe mechanisms and alerts if anything breaks, preload transformations and auto schema mapping precisely control how data lands in your destination, models and the workflows to transform data for analytics, and reverse ETL capability to move the transformed data back to your business software to inspire timely action. All of this plus its transparent pricing and 247 live support makes it consistently voted by users as the leader in the data pipeline category on review platforms like g2. Go to dataengineeringpodcast.com/hevodata today and sign up for a free 14 day trial that also comes with 247 support.
In your work of building Workstream and working with teams, trying to help them get that holistic view of their data assets and data usage and the overall life cycle of those pieces of information? What are some of the most interesting, innovative, or unexpected ways that you've seen Workstream applied?
[00:43:12] Unknown:
The first thing is just the the stuff that people have asked us to support, even just, like, thinking about an asset. And I've just talked with, like, best of breed teams who are just, like because of very specific reasons, like, doing things that, like, you would claim are not, like, best of breed. Right? Like building, like, 1 off docs with, like, copy and paste the tables from, like, Looker, an example that they go and present. There's reasons that they go ahead and do that. And so we've gotten a lot of pull to pull in, like, things like that as, like, a viable asset type or, like, data that's living within, like, an operational system, like your CRM or your, like, customer support system.
So I think that has been has been interesting in a direction I, you know, I haven't seen. I would say 1 of the biggest innovative pieces, though, when I think of, like, actual use cases in our workflow is the stuff that data teams do themselves, like, internally within our product. Right? And using it just for a lot of, like, internal, like, very quick hit, like, conversations. Right? And seeing folks pull some of their workflows out of, like, horizontal communication systems into our tool has been has been really, really interesting, and a lot of that's around, like, feedback loops when developing something new.
[00:44:37] Unknown:
In your own work of building the platform and building the business, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:44:46] Unknown:
So so many. I would say, this is someone who's just, like, generally very impatient as a person that building, like, products and changing like, having folks change behavior even when they know that, like, the existing, like, behavior is, like, broken in some way, shape, and form. It's just it takes a really, really long time. And sort of to be patient. That means a lot of different things, but it's especially true when building very technical products or workflow products that require folks to go from like, hey. I'm doing this thing every day in this way to doing it in a slightly different way. And that takes time not only to build, but to get right, and it's very nuanced. And for us, that means, like, we literally obsess about, like, every interaction that you can take in our products. Right? Because that experience is is so so important. So I would say that's the first 1.
It takes a long time, and so the plan for it to take a long time and to be patient with your users and to be patient with your team to get the product where it needs to be. That's the first thing. I would say the second thing is there's a huge gap between what people say they're willing to do and then what they're actually willing to do. And that's especially important when you're building a new product. Right? And it boils down to, you really have to not only be solving a big problem, but your solution can't be incrementally better than what already exists, right? It needs to be 10 times better. And that's when it becomes a no brainer to not only try something new, but adopt something new. And so when someone says, you know, hey. This is a problem. This is interesting.
Like, warning bells to an extent because, like, it's a siren call that you're onto something, but you don't have to necessarily come up with, a compelling enough solution for that specific person. And then maybe the last thing would be try not to obsess too much about what other folks are doing, either folks who are potentially building something tangentially related to you, especially if they're seeing some success. Right? Like, don't let that be a reason to, like, move off your vision. There's a lot of other reasons you should you should potentially stick with your vision. And try not to pay attention to, like, what more broadly is happening around you. Like, of course, we've lived through pretty turbulent and in many ways traumatic times the last few years.
We're in a very interesting macroeconomic environment right now with, like, the global, like, economic slowdown. Those external factors matter, but they're fundamentally much less important than than what you're actually seeing and, like, what your relationship is with your customers. Right? That really is the most important thing. Focus on that, and then that becomes the guide for how to navigate the world around you.
[00:47:47] Unknown:
For people who are interested in being able to get more holistic visibility of their data assets and be able to maintain that communication and context about them? What are the cases where Workstream is the wrong choice?
[00:48:00] Unknown:
I generally say, if you're a small organization, we're generally not gonna be a good fit for you. Like, if you are a 30 person organization and you have a single person on your data team, like, feel free to try out my product. There's a free version of our product. Go ahead and play around with it, but we're generally not the best fit. We're a better fit for, like, the larger, more complex organizations with bigger teams that are very data driven and are on the path towards, like, this state of entropy that we've been talking about. Right? And we care about everyone, but those are the ones that we're vested for. I would also say, like, you generally need to be an organization that has truly embraced, like, the latest and greatest, like, modern data technology and the modern data stack. Right?
And within that, we're generally not the, like, first tool that you're gonna use. Right? We're like a year 2 type of solution that you'd introduce into your environment. Right? You're if, hypothetically, if you're, like, setting up a new data stack, the year 1 is, like, setting up the data stack, and then years 2 through whatever is about managing it and maintaining it and empowering the org. And so folks are normally a little bit further along in their journey. It's less the, hey. This thing is getting rolled out alongside a brand new deployment of Looker or the BI solution of choice.
[00:49:26] Unknown:
And as you continue to build and iterate on the product and keep an eye towards what is coming down the pike in terms of the data ecosystem and the ways that people are using their data? What are some of the things you have planned for the near to medium term or any particular problem areas that you're excited to dig into?
[00:49:43] Unknown:
Yeah. And I think in general, 1 of my big thesis and hypotheses, and I think you're gonna already see it happening, is that we are increasingly headed towards the heterogeneous environment and where more and more more and more tools are gonna be used to, like, analyze and consume data. Right? And we're going for more of a monolithic solution, 1 size fits all to the hey. We have all of these different jobs to be done in Linux, and we're gonna leverage, like, the best in breed solutions for each 1 of those. Right? And maybe there is something you're using for exploratory analysis, like a minor notebook. Maybe you're using a BI solution for dashboarding.
Maybe you're using something to manage your metrics centrally. You're using something else to push data out into other systems. And so I think we're gonna continue to see data proliferating in all these different places. And so as that continues to evolve, we'll evolve along sided. And generally, we want to be agnostic to what teams choose. And so we wanna be able to support anything and everything under the sun that data teams embrace to analyze and consume data. And then I would say the last thing is we're we're we're doubling out a lot, especially in the next year on some of what we talk around kind of usage analytics of your data assets and and really understanding more and more about how end users are embracing what's been built for them.
There's a lot that we already do there today. It it feels like a big missing part of the data team toolset, and we're excited to see customers embrace that more and to make what we already offer today more powerful.
[00:51:19] Unknown:
Are there any other aspects of this particular problem area of being able to gain cross cutting visibility of data and its usage within an organization or the work that you're doing at Workstream to support that that we didn't discuss yet that you would like to cover before we close out the show?
[00:51:36] Unknown:
If I could leave the listener with 1 thing on how you should think about us, we care about fundamentally, like, people and what they do with your data and how we empower everyone to do that better. And so we really think of our product as a like a collaboration product at its core, and that's what makes us excited. And it's about bringing people together, and it's about bringing people together about their data, which is even better than that. So that's what kinda gets me excited about talking to customers every day and seeing how they're embracing our solution.
[00:52:13] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:52:28] Unknown:
Yeah. So I think and this is tangentially related to, like, our problem area. But 1 of the things that I think is a huge gap when you think about, like, machine learning applications is I really think no 1 has when you think about, like, the operational aspects of data, there's really no machine learning applications around that. And by that, I mean, how do we, like, programmatically, like, let people know what's being said about the data, how it's been used in the past. Right? And so I think, in the next 10 years, we're gonna see some some really interesting, I think, evolutions in that space, which I think are gonna kind of completely form the way we interact and relate to to our data and what people are doing with it.
We have something we call the data concierge, but, like, what I'm talking about is, like, an actual, like, automated concierge that can, like, automatically, like, give you answers about your data. And I'm excited to see who finally is able to bring that to market successfully because I think of anything that's gonna be, like, the biggest game changer in not what data we have available, but then in our ability to go ahead and action it.
[00:53:42] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing at Workstream and your vision of being able to provide a unified context for data assets within an organization and how they're being used. So appreciate all of the time and energy that you and your team are putting into making that a reality, and I hope you enjoy the rest of your day. Thanks. You as well. Thank you for having me, and enjoy the rest of your day.
[00:54:10] Unknown:
Thank you for listening. Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the machine learning podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at dataengineeringpodcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Guest Introduction
Nicholas Freund's Background in Data
Overview of Workstream Product
Workstream vs. Data Catalogs
User Interactions and Interfaces
Challenges Without Workstream
Lifecycle Management of Data Assets
Technical Implementation and Integrations
Impact on Teams and Productivity
Evolving Problem Areas and Customer Feedback
Interesting Applications of Workstream
Lessons Learned in Building Workstream
When Workstream is Not the Right Choice
Future Plans and Roadmap
Final Thoughts and Closing Remarks