Summary
Business intellingence has been chasing the promise of self-serve data for decades. As the capabilities of these systems has improved and become more accessible, the target of what self-serve means changes. With the availability of AI powered by large language models combined with the evolution of semantic layers, the team at Zenlytic have taken aim at this problem again. In this episode Paul Blankley and Ryan Janssen explore the power of natural language driven data exploration combined with semantic modeling that enables an intuitive way for everyone in the business to access the data that they need to succeed in their work.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack
- Your host is Tobias Macey and today I'm interviewing Paul Blankley and Ryan Janssen about Zenlytic, a no-code business intelligence tool focused on emerging commerce brands
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Zenlytic is and the story behind it?
- Business intelligence is a crowded market. What was your process for defining the problem you are focused on solving and the method to achieve that outcome?
- Self-serve data exploration has been attempted in myriad ways over successive generations of BI and data platforms. What are the barriers that have been the most challenging to overcome in that effort?
- What are the elements that are coming together now that give you confidence in being able to deliver on that?
- Can you describe how Zenlytic is implemented?
- What are the evolutions in the understanding and implementation of semantic layers that provide a sufficient substrate for operating on?
- How have the recent breakthroughs in large language models (LLMs) improved your ability to build features in Zenlytic?
- What is your process for adding domain semantics to the operational aspect of your LLM?
- For someone using Zenlytic, what is the process for getting it set up and integrated with their data?
- Once it is operational, can you describe some typical workflows for using Zenlytic in a business context?
- Who are the target users?
- What are the collaboration options available?
- What are the most complex engineering/data challenges that you have had to address in building Zenlytic?
- What are the most interesting, innovative, or unexpected ways that you have seen Zenlytic used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Zenlytic?
- When is Zenlytic the wrong choice?
- What do you have planned for the future of Zenlytic?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Rudderstack: ![Rudderstack](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/CKNV8HZ6.png) RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines. RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team. RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again. Visit [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack) to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Legacy CDPs charge you a premium to keep your data in a black box. RudderStack builds your CDP on top of your data warehouse, giving you a more secure and cost effective solution. Plus, it gives you more technical controls so you can fully unlock the power of your customer data. Visitdataengineeringpodcast.com/rudderstack today to take control of your customer data. Your host is Tobias Macy, and today I'm interviewing Paul Blankley and Ryan Jansen about Zenlytic, a no code business intelligence tool focused on emerging commerce brands. So, Paul, Paul, can you start by introducing yourself?
[00:00:49] Unknown:
I'm I'm Paul. I'm the co founder and CTO of Xelitic. So, I, you know, just got started, doing math and computer science in undergrad, worked for Roche, in their math department for a little bit. Then, went on to grad school at Harvard, also in, machine learning, which is where, Ryan and I met.
[00:01:09] Unknown:
And Ryan, how about yourself?
[00:01:11] Unknown:
Yeah. Thanks, Tobias. So so glad to be here. My background is I was an engineer in undergrad, and I was an engineer just out of undergrad. But I kind of quickly jumped across to the, the end user side, I guess. So I moved to the UK, became a VC there. And spent a bunch of time in kind of different kind of analytical roles on the other side of the BI equation. So I was like a data consumer type. Then when I went back to the States, that's when I kind of decided, to cross the table, and knew I wanted to start sort of being a data practitioner again and, started that by going back to grad school, which is where Paul and I met, and we kinda worked on everything together. Since then, that was the start of a beautiful friendship.
[00:01:51] Unknown:
And going back to you, Paul, do you remember how you first got started working in data?
[00:01:57] Unknown:
Yeah. So we, so Ryan and I actually started doing, data consulting in, in grad school. So that was that was everything from, you know, DD on acquisitions to, you know, setting up, you know, machine learning fraud models. And a big part of, you know, what that business became was setting up these analytical stacks. That's going into companies everything from, you know, seed stage startups to fortune 100 and helping them set up their analytical stacks, setting up, you know, what's effectively called the modern data stack. And that was that was part of the genesis of analytic where we where we realized that the the tools were just not keeping up with the rate of change that we saw in AI and the underlying just improvements in the data warehouses. I mean, the the performance of Snowflake and BigQuery and their performance just as they've improved over time, it's just mind boggling.
[00:02:46] Unknown:
Yeah. It was a quite exciting place to be actually. Like, when we were doing that work, there's a genesis of several different things. I mean, first, it was we're we're seeing the formation of the modern data stack. Right? We're seeing those tools start to start to form up. We're seeing the capabilities of warehouses get much, much faster, and we were seeing the capabilities of you know, at the time, it was like so like the great grandparents of of modern large language models, but Paul and I were studying computational data science and we were working with, like, I guess, what were the small language models at the time. And we were watching all those things evolve at a pretty fast rate, and I guess that's what ultimately led us to building's analytic. But, you know, watching that and watching, the pace was just such an exciting place to be.
[00:03:27] Unknown:
And in terms of what you are building at Zenlytics, can you give a bit of an overview about what it is and some of the story behind how you came to this particular problem as the thing that was worth consuming your time and energy and sleep?
[00:03:42] Unknown:
Absolutely. The I would say the main driver was seeing the changes that we were seeing in AI, coupled with the changes in the modern data stack and the improvements, but just the abysmal lack of adoption, inside of the companies we were helping on a consulting basis. So it was just really, really hard for end users of these products to use them. They just wouldn't go in. Like, the the usage on all these dashboards and on all these, you know, like, just BI tools is really bad. And it's because no matter how much work you do to try to configure these and set them up and keep things clean, it's just hard to poke around its interface and and, you know, actually make self serve work. So our whole goal as a company is to make, you know, self serve truly possible.
Yep.
[00:04:28] Unknown:
And and we all have have been there where it's like you're kind of, you know, banging your head against the wall as an analytics engineer trying to drive adoption out of the tools. Right? And it's like, why won't people just use these, cool data consumption tools in the end? And that's 1 of the biggest challenges actually working in the field. And and I guess our belief or what we sort of learned in our time doing that is that a lot of the tools for the end users are really like built by data nerds for data nerds. Right? And they're actually, fairly sophisticated that some of them require, you know, a lot of them start with a SQL query for instance. You know, it's just beyond the capabilities of someone who just is more focused on the domain of their job. They don't wanna make understanding and using these tools like a full time profession.
And we just think there's a gap there. We think part of that has been limitations of the tech. Right? So, like, I think that a lot of tools have been as as self-service as as they could be, given the limitations of the tech. So we saw static dashboards at the start because that's as fast as a warehouse could run. You know, the the tech is slowly advanced, but, I think it's it's really been, like, we're we're almost talking in weeks now, but it's really like the capabilities that are really being unlocked by the sophistication of what we're seeing with large language models. As we know, like the the end state for a lot of BI is that that, you know, driving that adoption usually ends with an email to the data team, and it's like, hey. Can you pull this data for me?
So, like, we know that the end state is actually very conversational. It just happens to be with a data analyst or an analytics engineer right now. We're we're finally starting to see sort of capabilities of the underlying tech that that conversational just, you know, that interaction can be handled a lot of the time, you know, with a large language model. So that gets us really excited.
[00:06:18] Unknown:
And you've mentioned being a consumer of business intelligence tools for various portions of your career And business intelligence is obviously a very mature and crowded market and has gone through several different generational shifts with different kind of areas of focus. There are business intelligence products for every industry vertical as well as horizontals. And I'm wondering, what was your process for kind of identifying the specific problem that you were focused on solving and some of the ways that you can build a new entrant into this market that would actually stand out and be compelling to the audience that you're focused on? It's a great question. So I think the the big thing is that so the big thing that we're focusing on is self serve is that you're able to, you know, actually have end users,
[00:07:07] Unknown:
answer their own questions as opposed to, you know, again, like Ryan said, email the data team. And the main the main sort of drivers in in my opinion of what changes the BI market is when the underlying infrastructure changes. So it's sort of like, you know, you had OLAP cubes way back in the day when, you know, you couldn't really do much else computationally besides OLAP cubes. And then, you know, as technology advances, you have Tableau come in with indexes, and all of a sudden you're able to index this data in memory in Tableau and, you know, have this level of interactivity with it. A higher level serve than was possible with OLAP cubes. Then you have, you know, the data warehouse like Looker and, you know, the the data warehouse is like Snowflake and BigQuery, then Looker sits on top of it, and all of a sudden you're able to explore around and, you know, explore from here on a dashboard and actually slice something, you know, by something maybe not on the dashboard.
A level of interactivity and a level of self serve that wasn't previously possible, with Tableau. And the main the main thing that we saw was that the rate of change AI meant that that was gonna be the next wave. That was gonna be the next, you know, big underlying change that drives, you know, cape capabilities that were previously not possible in the BI stack. So that's exactly what we're what we're building and what we're taking advantage of to to bring that next layer and that next level of self serve.
[00:08:28] Unknown:
The the really important thing is that, you know, self serve has always sort of been the goalposts have moved as to what self serve actually means. Right? I guess the framework that we sort of think about that is is like what can be achieved by someone using data tools who doesn't know SQL or Python or, you know, someone who would be very technical. And as as Paul says, like, we've moved from, like, originally, it was just, you know, basic dashboards, all there through, like, more dimensional, you know, some some exposure dimensions and metrics that can be accessed in an easy way. And, yeah, then, you know, the the next step for for us is self evident with this conversational technology. I think there's an interesting discussion that we had as well around the what needs to happen in the data stack to enable the LMS as well. That's something we also feel very passionate about is that, you know, these LMS are very capable of, hallucinating and making mistakes. You know, my my my wife and I were just fooling around, with chat gpt the other night, and they was asking us our bios. And, we were asking in our bios and then it gave a great biography of my wife and it talked with the whole paragraph about her time at Goldman Sachs and, you know, what you did there and her progression and everything very articulately.
My wife has never worked for Goldman Sachs before. So it's, you know, that's really funny when you're sitting at home playing around with chat gpt. That's catastrophic when you're relying on business, you know, mission critical reporting, business reporting. You can't just make stuff up. So we actually believe that while the large language models are great, you know, this, like, Texas equal approach is not the right approach. We think it's necessary to also really think about, the semantic layer as an essential tool for enabling this. And it's really like the intersection of of these models plus the semantic layer that enable this sort of self serve paradigm.
[00:10:08] Unknown:
As you've mentioned, the what self serve actually constitutes and who that self happens to be has been a very kind of progressive exploration of what's possible, what's practical, what's pragmatic, and kind of how much effort a given team is willing to put into making that self serve aspect possible from both directions of the engineering effort to constitute the data stack and set up all of the semantic elements to be able to make data exploration possible and understandable and intuitive for people who don't necessarily have a technical background as well as the amount of effort that the kind of business users will put in to understand those technical and data semantics, versus just saying as as we've been discussing from a conversational perspective, tell me how my sales are doing because that's not specific enough. Like, which sales? Where? Why? How?
Like, you know, e e even in this conversational mode, there's a a level of nuance and kind of background that's necessary for somebody to be able to make effective use of it. And so I'm wondering before we get too much into, like, in, like, the semantic modeling and the conversational UI, what are the elements that are coming together now that make you think that you're actually going to be able to effectively deliver on self serve as everybody thinks it's supposed to mean versus what we've mutated it to mean because this is the limitation of what we could actually do.
[00:11:31] Unknown:
Absolutely. I think I think the biggest thing there is actually the conversational component itself. Because exactly what you said. Right? It's like, you know, how are my sales doing? And, you know, the right response there isn't actually a plot with some arbitrarily chosen, like, revenue metric. It's, you know, what do you mean? You've got, you know, gross sales, net sales, and then, you know, net profit, you know, which, like, which, you know, sales are you talking about? And the person might be like, oh, well, I mean, you know, I mean, that's sales. And then, okay, now you can show them the plot. But it's like being able to ask those follow-up questions and being able to to push the the end user toward what they're actually trying to get to. So it's like a question like, an example that we would see if someone's like, what's my best marketing channel? It's like, woah. I mean, that's a sort of a loaded question. So you've gotta ask clarifying questions and be able to help, the end user who's asking the question articulate, you know, what it is they actually want. Because, you know, as as a person who's received a lot of these emails asking about data, that is part of what you're doing. You're you know, someone's asking about churn, and you're able to say, do you mean customers we've completely lost or just subscriber churn?
And, you know, just being able to ask us clarifying questions is a really big part of being able to make self serve possible because there's never a 1 shot answer.
[00:12:46] Unknown:
And and both sides exist for that too. Right? Even if you give the right definition of churn, maybe you didn't specify you wanted it weekly instead of monthly or or whatever. Right? Like, some of some of the earliest versions of this Slack when we're still experimenting with the tech before there were you know, before chat gpt was a thing, before gpt was a thing at all and the earliest versions of the Lytic, it started off with a single question, which is like a good start, but as, you know, as we all know as data practitioners, when is when is the last time someone asked you for a quick data pull and it ended in a single turn. Right? Almost never. So, like, that that's that's why we're so bullish on chat actually is, like, the ability to both understand that nuance and context and also, you know, prompt and clarify it on both sides to get the person to the right answer, is is very, very powerful.
[00:13:30] Unknown:
Digging more into that kind of semantic modeling and the kind of upfront investment that's required to be able to actually enable something like a large language model to operate on top of the business context and the kind of domain objects that are necessary to be able to do that data exploration, what are the kind of upfront investments that have to be made before Zenlytic can then actually be used effectively?
[00:13:57] Unknown:
So I would say the setup the setup is gonna be easier than Looker, but, similar in nature, right now. So, you know, you're gonna basically set up views that sit on top of tables. You can define metrics. 1 of the key differences between us and liquor is that we don't have explorers. You just define primary and foreign keys, and we handle all of the, you know, joins, basically. So that makes, like, the setup a little bit easier. But the really, really exciting thing is that we have a recent beta feature that actually lets you have have this semantic layer setup that also done by GPT. Where you basically say, hey, these are the tables that I wanna include. Go and give me basically a first pass. So it's kind of like a super powered Copilot for setting up these semantic layers, because they are you know, everyone who's done this before knows it's a pain to maintain them. It's a pain to set them up, and most of that's because you're writing this boilerplate LookML or DBT metrics or, you know, metrics in the AML definitions or, like, whatever it is, and it's a pain. So, you know, we have some AI tooling around making that less of a pain.
[00:15:00] Unknown:
And digging more into Zenlytics itself, can you talk a bit about some of the implementation detail, the architecture that you've had to build, some of the kind of internal data modeling that you've had to do to be able to kind of flexibly map these different domain objects into the kind of ML late large language model space and that translation layer to be able to move from the, you know, very structured hierarchical data driven aspect of what we're actually trying to dig into and the very messy, confused, you know, adaptable, you know, constantly evolving human language space?
[00:15:36] Unknown:
Totally. No. It's a it's a it's a great question. The main thing there is being able to have an effective way to serve the context of the semantic layer to the model. Because what the model brings to the equation kind of is like like I think about it, like the model brings comprehension. It understands, like, what the user is asking about very, very well. The semantic layer brings correctness. If you ask for, like, net revenue by marketing channel, it will never give you a wrong answer. But it's like the intersection of those is actually giving the model the right context, where the model has the ability to look at the semantic layer. If someone asks for total net revenue over time, it knows about the metric net revenue. It also knows the right date field to choose to trend net revenue over time, and it knows how to apply, like, the filter last month to that date field to make sure that, you know, the query actually gets executed correctly. So part part of what the semantic layer needs to encode is those sort of core gaps that people just speak about colloquially, but have to mean explicit things in the data warehouse. So time being 1 of the most complicated ones and 1 of the ones that we've invested a lot in making sure that we handle correctly.
[00:16:44] Unknown:
1 1 important thing to to discuss here too is that, like, what we're trying to do the the the important part of the semantic layer design is getting to the right primitives, and and the primitives for actual sort of like end user consumption, I guess. Right? Like, essentially those semantic layers are like it's a translation between data terms and, you know, user understandable business terms. And if you get those primitives right, then it becomes much easier for the language models to understand it as well. Right? So, like, those those are sort of the stake in the sand, which is is where a lot of the magic happens. I think that we're, you know, I think we've collectively as data folks, like, we've kind of gotten pretty close to discovering what those right primitives look like, and they're they're fairly consistent across tools. Right? So, like, metrics and dimensions and filters and, like, those those sort of primitives are both, you know, flexible enough to handle, like, the necessary cases, but also, understandable to the end user and and, you know, powerful.
So, like, it's it's a good combination. I I guess 1 area where our approach differs slightly, is that we tend to avoid the use of, you know, like, false data explorer or like a data mart or or some sort of, you know, end consumption table in which, you know, those cases where if there's a separate metric and there's a separate sales metric in 3 different sort of datamarts, that can also be confusing. Right? There's 5 datamarts that have sales in them. That's that's confusing, you know, if you're a human trying to use semantic layers. So I think 1 thing that we endeavor to do is actually abstract that part out of the consumption layer so you don't need to choose a data mart or or some, you know, set of tables or whatever.
You just need to deal with your own metrics, basically. I think that actually is 1 of the, 1 of the things that have we've been, again, as data people have been kind of just exploring and flirting with for a while, and it's like we're finally starting to, you know, put that behind the scenes basically so that the end users don't need to focus on it. As far as the kind of evolution
[00:18:36] Unknown:
of your work and the corresponding evolution of the supporting technologies, I'm wondering how your understanding and implementation of semantic layers has evolved and some of the necessary substrate that you've had to put in to be able to have an effective kind of representation of those domain semantics so that users can create their own mappings, create their own business objects, but do it in a way that is kind of maintainable from your side without being, you know, a spaghetti mess?
[00:19:10] Unknown:
I think I think the the first part of that is not having a concept like Explorers. Because having a concept that sort of inherently makes you duplicate, joins and stuff lead leads to code that's and the code and, you know, a semantic layer that's pretty just hard to maintain overall. So not having that concept is super helpful. And I think on the on the iterative side, there's for sure been things that we've realized, you know, as as we progress. Like, hey. There's a real need for for people to be able to do x. So, like, a good example is, you know, a lot of the times you have metrics actually that span multiple tables that cannot be joined together. You have to basically run those queries and then merge the results at the end, and then create some metric based on, like, the merging of those results.
That's something that we actually built in to the semantic layer to give it the ability to be able to do that. So from the end user's perspective, they can just look at, you know, number of sessions and, you know, inventory over time. And there's no reason like, there's no way to join those tables together. You basically just have to aggregate them up by date and, you know, merge the results at the end. So, you know, a feature like that is something that we initially didn't, you know, weren't sure that that should be something or we would need to include in the semantic layer, but there's a there's a huge amount of demand for for that ability, and just making it as easy as click click instead of figuring out a whole merge results kind of interface.
[00:20:32] Unknown:
And the other big the other big philosophy, I think, which is it's just just an overheads got it. It's it's it's it's obvious, but I don't think we we do it all the time as data folks is is just dryness. You know? We we we endeavor to to use software engineering best principles and in in most cases we do, but, you know, spaghetti tags, 1 thing that they all have in common is that they generally get pretty wet. Right? And, like, there's lots of repeated code and there's lots of whatever things they're inheriting in multiple places and that's that's that's a hard problem for data folks because it is like it's it's it's it's a lot harder to make a DAG dry than, you know, an application dry. And I think the tools are still evolving in that direction, but, that's something that we've always, like, kind of, like, put a pin in is, like, alright. We want this to be as dry as possible. That's actually kind of been, like, a north star for us in design of everything. And I think that simplifies things a lot. I think there's there's still a a long way to go for us and for all tools to really achieve that. But I think as a guiding principle for simplifying the spaghetti, that's probably a first a good first step. The other interesting element of this space is the rapid evolution of large language models. And you men as you mentioned, when you first started this, GPT was either not a thing yet or just barely becoming a thing.
[00:21:45] Unknown:
Now we're on to chat gpt 4.5, I think. And, obviously, there have been kind of exponential leaps and bounds in terms of their capabilities as well as their capability to, as you said, hallucinate quite, quite in quite detailed fashion and sometimes quite convincingly. And I'm wondering how that has impacted your overall kind of product approach and some of the additional testing and validation that you've had to do as these language models have become more sophisticated and potentially more
[00:22:22] Unknown:
or so worried if we had, you know, like, GPT 4 generating, you know, SQL statements, because it could be wrong in such sophisticated, like, impossible to detect ways. But But since we're referencing a semantic layer, if it just comes up with a metric that doesn't exist, the semantic layer can say, hey. That doesn't exist. Error message. You asked about something that doesn't exist. So it's like that hallucination just ends up being like a simple error as opposed to, something that's, like, catastrophically bad for the business.
[00:22:52] Unknown:
Yeah. I would I would say that our our approach here, especially, I mean, things are changing so fast and and no 1 knows where they're gonna land or, you know, like, we're talking day by day in the developing. But I think that our our approach is plan for the worst and hope for the best. I will say that we're building for current tech, you know, so, like, nothing in Zenlytics requires an increase in GPT's comprehension capabilities. GPT 5 would be nice, but, you know, it's built to work with GPD 4 and stuff that's already been published basically. At, you know, if basically if if, if if they turned off, you know, the thing that gets me excited is that they turned off the development of all large language model technology today. They did more than a 6 month month. They did they did a 6 year pause on that. You know, even as a software organization, I feel like, you know, Copilot, GPT, like, have made us as a team what would you say, Paul? Like, at least 25% more productive. Right? Like and that's just tools that are in the market. So, like, there's there's real world use cases happening right now, and we're building for the tech that uses those, you know, the we're building the real use cases that use the field tech. If things get better, faster, smarter, that's really just icing on the cake for us.
[00:24:01] Unknown:
Yep. The other thing I was I'd add is that it's enabled features that I personally would not have even, like, dreamed of being possible. Like, being able to just point at a bunch of tables and, you know, figure out, you know, with no actual keys, like, okay. What are reasonable joins? What are reasonable metrics? Like, what makes sense here? And just being able to have a really good first pass. I mean, it's never perfect, but it's like having the developer tools to be able to just go in and have most of your work done, and you just kinda, like, tweak a few things. Like, I never would have even thought that that was possible. And the capabilities of the of the underlying LLMs are just so strong that, you know, we're able to build features that I wouldn't even dreamed of before, like, Chat TBT came out.
[00:24:47] Unknown:
I think people I think obviously, there's a lot of hype right now, but I think people are still underappreciating the second order effects of this tech. And, you know, so in in traditional software, for instance, you see people tweeting all the time where it's like, yeah, I'm an indie hacker. Like, I've always wanted to build these different tools, but I couldn't quite do it from, you know, with my capabilities. But, like, with GPT, you know, this would have been impossible for me now and if possibly for before, now I can do it in, you know, a morning.
So it's like it's just unlocking all sorts of opportunities for innovation elsewhere despite being such a good coder, for instance. I think we're gonna see a lot of similar second order effects that happen when you can make that, you know, really fast build, measure, learn, loop possible, with great subsets. You know? So, like, I'm just excited about that in general.
[00:25:31] Unknown:
In terms of actually onboarding onto analytic, I'm wondering if you can talk through kind of what are the expectations that you have on the customer side as far as do they have a data warehouse? Are they brand new? They have no idea what they're doing. Like, what what are the kind of technical capabilities, system capabilities that you're expecting, and then the process of getting somebody onboarded and integrated with Zemlytic to be able to then start actually exploring their data and asking questions?
[00:26:00] Unknown:
So all you basically need are credentials for data warehouse. So we do rely you know, we support all the major data warehouses, but if you come with credentials for a data warehouse, you can basically put those credentials in, and, you know, click this AI deploy button we've been talking about, and, you know, make a few tweaks to make sure that things are customized how you need them to be, and you're off to the races. So the implementation comparatively is is quite easy. You just come with database credentials.
[00:26:29] Unknown:
In terms of database credentials, are you kind of largely assuming that somebody is just, you know, single application? They this is their line of business product. You just wanna tap into their, you know, application database that's all denormalized transactional, or are you expecting, you know, we've got everything in our data warehouse, or is it just yes?
[00:26:51] Unknown:
Either of those, any of those It's it's it's expecting a data warehouse. So yeah. Definitely definitely expecting data warehouse. So, you know, you can have any data from all these sources loaded in there, but anything that's not in that 1 data warehouse
[00:27:03] Unknown:
is gonna sort of not be joinable with the other stuff that's in a different area. So if you wanted to if you wanted to do something like that, you'd have to use some sort of distributed query system like a Starburst or something like that, which is not something we are building. I will say it's cool though. Like, the 1 of the things we haven't really talked about I keep on my dial on something like Lyric just so excited about it, but But 1 1 of the neat things is that, you know, with with a lot of the sort of text to people stuff we're seeing, if you read between the lines, it's it's a toy example. Right? It's always a couple of tables, usually 1, sometimes a couple of tables that are very, very small both in rows and columns. The semantic layer makes it possible to actually make queries across an entire data warehouse of arbitrary size. Right?
It doesn't matter, you know, how complex the tables are or, you know, how much data is in there or you have many metrics you have is those are all defined in advance and you can actually have bulletproof confidence in joining and manipulating that data using the semantic layer. So and that's why we're we're only ware centric. So we don't do any sort of application layer, you know, direct connectivity.
[00:28:01] Unknown:
Once somebody is set up, they're using Zenlytic, doing data exploration. I'm wondering if you can talk through some of the typical workflows and in particular, some of the collaboration aspects of being able to, you know, work across, you know, from data teams to operational teams, across operational teams, within operational teams, some of the ways that data exploration can be kind of made visible across those different roles?
[00:28:27] Unknown:
I think, the biggest collaboration opportunity actually isn't in our interface or in anyone's interface at all. It's where the team already is. So a lot of the time for a lot of teams, that's in Slack. So we, you know, have a deep integration with Slack where you can just be in a channel and, you know, be at some lit, like, you know, how has this campaign been doing in the last week, and boom, you get an answer in the thread right there. So 1 of the things that we've focused on a lot in collaboration is bringing the data to where people already are. Because for, you know, a person who's in the flow trying to, you know, run a new campaign, trying to, you know, send out a new, you know, email newsletter or something, they don't wanna go and log in to some other tool no matter how easy it is to use. They wanna just stay right where they are and just get the answer they need. That's that's 1 of the big things we focus on collaboration wise. It's taking data to where people already are.
[00:29:18] Unknown:
1 1 funny thing, like, I remember this is this is a marketing story. I remember a friend of mine who was a marketer said that, like, a lot of bad marketers assume people wake up and say, oh, man. I wonder what Coca Cola is doing today. And in a way, we're kind of, you know, collectively guilty of that with BI tools as well. You know, we we assume that people, are enthusiastic about sort of logging in and configuring and finding the right stuff, you know, in that tool when in fact, they're incredibly distracted with the day to day and they just wanna get their data. Right? So, like, a big part of that collaboration is is going to where they are, and where they are happens to be the place where they all are so they can collaborate together there.
[00:29:55] Unknown:
In terms of your engineering effort of buildings analytic, I'm wondering if you can talk through kind of the proportional effort of dealing with large language models, how that gets deployed into your infrastructure versus managing the kind of semantic layers and helping to expose that to users and deal with some of the kind of edge cases around that or the kind of UI and, you know, chat interface to it? Just kind of what where what are the areas that have taken the most time
[00:30:24] Unknown:
versus what your expectations were going into this? I was gonna say the the most surprising 1 was how fast we're able to deploy a lot of the large language model stuff, Especially once you sort of figure out the tooling and you become comfortable with, like, writing code and writing code in a way that works well with large language models, which is like a little bit of a different skill than writing more, you know, determinants of programs. But it's like once you become familiar with that, like, you can push new features and improvements, like, really fast. That's been surprising for me, just how fast that process has been. Yeah. I would say I would say the green light is deploying language tech. The yellow light is the semantic
[00:31:03] Unknown:
layer tech, which is led by Paul and our team. That's that's, you know, a big part of that is, like, Paul just going mad scientist locking himself in a room for a week and making a big adjustment to this very sophisticated semantic layering tech, but that's the yellow light. And I think actually the hardest part of the red light is probably the user interface design for the BI tool, I would say. And we haven't really talked, you know, about the fact that analytic is also a fully featured BI tool as well outside of the chat. And so so first is building a BI tool in general takes a tremendous amount of, you know, thought and iteration on the UI side. You're taking a very, very complex thing with a ton of boundary conditions and you're taking a very murky use case with a ton of boundary conditions. And you're kind of the glue to hold those together, and you have to be, you know, simple enough so that the end user will understand as per our, you know, self serve ping. So so UI design is difficult in general, and then we also have the added difficulty of how does that mesh with, you know, what people will use or how people use BI next and, like, how does that mesh with the chat and how do you make it jump back and forth between the chat and, you know, it's just it's kind of challenges on challenges and I'd say we probably in the development of Zen Linux, we probably spent more time on making sure that is really elegant and easy to use than probably just learning else. Is that is that fair, Paul? Or
[00:32:19] Unknown:
Oh, yeah. It takes it's it's just so hard to make an interface. Take something that's really complex like a large data warehouse with a bunch of complicated metrics and make that as palatable and as easy as possible, where you can just, you know, if you're non technical user, just click in, click a few things, and it works how you expect it to. Shockingly hard to make it just to make it just work, basically.
[00:32:41] Unknown:
And actually so like when I was investing as a VC, I come from an era it would actually when when mobile was actually kinda like the big thing. And they're, you know, the companies are winning and losing based primarily on how usable their interface was. Right? It's the same thing. We're inventing new designs for mobile and it had to be very, very simple and straightforward and that was actually an act as a competition. You know, we've always tried to carry that as analytic and keep things simple and understandable as zen, but it's also just a 1000000 times harder because you're dealing with such an inherently complex thing like data. It's been it's been an interesting but very fun challenge. In terms of that kind of business intelligence, user interaction design,
[00:33:18] Unknown:
being able to have that chat interface so that it's exploratory, but also the more kind of structured, here's a dashboard, here's a set of charts, here's a way to actually dig deeper once you've gotten to a starting point from your conversational aspect, particularly given the focus on self serve as the default operational mode. What are some of the context clues and, kind of guardrails that you've built in to be able to guide people into the pit of success as it were, in their process of exploring data where it's, you know, tell me all about all my sales in North Dakota. And then saying, okay. Well, what do you mean by sales? Do you mean mean gross revenue? Do you mean adjusted revenue? Do you mean, you know, total total just sale units sold?
And then once you get to some sort of visualization, giving them the context clues, and then particularly given that conversational kind of lead in, are there ways that you're able to pull out the useful kind of semantic elements from that conversation to then add those as labels on or context clues within the graph to be able to say, okay. Based on what you're asking, these are the axes that we're going to present to you. These are the kind of grains that we're going to use for you to be able to dig deeper.
[00:34:29] Unknown:
Yeah. No. I think that so that's that's 1 of the big things. And, 1 1 feature we have is the summarization. So it's like as you go at asking questions, you'll see, like, both the plot and then a summary of, you know, what's actually going on. Like like what's the thing you asked about specifically. So if you ask about, you know, a spike over the holidays in, you know, traffic or something like that, then, you know, that might show you the the line with the spike, and then it would also draw to your attention. Hey. You know, this is the spike. This is, you know, 75% higher than, you know, the previous, the previous day or something like that.
And it helps actually make that understandable, but the other thing that does is that carries that information in the conversation. So as you keep asking questions, you know, know, like you were mentioning, if you reference something that was kind of a little bit up there, it's still able to be aware of that and to use that as it's or answering your subsequent questions.
[00:35:23] Unknown:
A couple of neat things. Just 2 2 quick things that came away from some of our, like, user testing throughout the, throughout our design process. What Paul's alluding to, I think, is that I I think I think the right approach to sort of, like, concise in the inputs, verbose in the outputs is the way I think about it. We found that people, you know, you want that people like to see more on the way out then. It's like you don't want people to be lost, basically. Right? So over communicating, is generally good on on on a feedback basis. And then the input basis people tend to be, you know, again, busy, distracted, and you wanna make that as simple and as clean as possible.
1 1 great example of that is early early on in our development, I realized people love to sort of, like, paw all over their visualizations. And it's funny because this is actually 1 of those, you know, limitations of tools that shape the interaction, But it's like there's always these there's always every BI tool has a visualization library that has its own way of handling clicks and stuff, and it really gets translated into very dynamic interactive plots. But we found just watching people people people click on stuff way more than than you'd expect, actually.
We, you know, our way to sort of address that was with lots and lots of context menus, you know, and very very smart context menus where people would click on a bar and it's like, oh, so yes, of course, there's a drill option or things like that or filter here. You see that we've gotten that far. But we've seen people, you know, they take a line chart and they kinda drag over to see what happens and, you know, we pop up a special context menu when that happens. Do you wanna zoom in here? Do you wanna explain this particular change? Make sure that all of the steps there are highly contextual.
And whether you're doing it in a GUI or in a chat, it's kind of it's treated like a conversation that guides you to, you know, the next most logical question. You're trying to anticipate the next steps in that conversation.
[00:37:05] Unknown:
And in terms of your experience of building Zenlytic, working with your customers, using it internally to build Zenlytic itself, What are some of the most interesting or innovative or unexpected ways that you've seen it used? So I think 1 of the most unexpected ways for me was in 1 of our customers was,
[00:37:23] Unknown:
using the explain the change functionality, which usually we've we've seen people use. It's like, oh, there's a spike. Why did that thing spike? There's a dip. Why did that thing dip? They were actually using it on a on a, like, just a flat, no real change, like, week over week, you know, sales chart. And the reason was that they wanted to be able to see the breakout that inside of even, you know, a week or a month where there's no real, like, major change in the top line number, There's all these reps that did really well, reps that did poorly, reps that, you know, didn't do like, you know, there's all these changes within the reps, and they were able to zoom in and see, okay, you know, so these customers with this rep aren't actually that healthy despite, you know, the top line number not looking like it moves very much. That was sort of offset by, you know, this other new customer we acquired that went to this rep that did really well, and they were able to, you know, find these drivers of how they can actually improve sales performance and sales efficiency, for their reps, even when there's not, like, a major change month to month in in their sales numbers. So that was surprising for me. I did not anticipate people using it on flat charts.
[00:38:28] Unknown:
Anything else to add?
[00:38:30] Unknown:
They they use that. I remember that example. They used the they used the word. It's like zooming in. It's like in, like, biology, like, 2 0 1 or whatever when you they give you a microscope and you look at the little slide of water and there's all these little paramecium or whatever. You're like, woah. And there's all sorts of stuff happening, so just under the surface.
[00:38:45] Unknown:
You know, it's it yeah. It's needed. Even though we call it explain change, it's it's it's fine to be used in a flat context. Yeah. And then given the fact that you're using these large language models as an operational component, what are some of the most interesting or unexpected kind of edge cases or, you know, weird behaviors that you've seen or had to kind of retune around?
[00:39:08] Unknown:
Yeah. I'd say 1 of the big things is just how much even sort of, like, maybe, like, just how much the direction matters and just, like, when you're when you're actually, you know, doing the prompt engineering to get these things to work. And then how much work you have to spend to handle the hallucinations, to handle the weird scenarios where it doesn't it goes a little bit off the rails or it doesn't quite, you know, do what you expect it to do. And, just going through and being able to, like, handle all of those is is, a different type of programming kind of.
So that's been really interesting because it will go off rails sometimes and just make stuff up.
[00:39:49] Unknown:
You know, format things in ways that you're like, where did you get this from? So handling those is is tricky. 1 of the big and most interesting things about dealing with LLMs is you have to throw away a lot of what you know about conventional programming. And, like, I'll give you 1 great example of that is, like, obviously, a big part of, the eyes is, you know, called, like, search like functionality, for instance. And in in our early sort of prototypes and experiments, the search worked okay until we realized you you have to move away from from, you know, actual search to semantic search. Right? And instead of searching for a dimension name, you have to search for, you know, slice by this dimension.
And and in doing that, that actually gives you, like, you know, a meaningful representation of the way that the LLM thinks. So, like, there's there's there's a whole new paradigm in dealing with these things that kind of devise a lot of conventional logic when it comes to programming.
[00:40:42] Unknown:
And in your own experience of building up Zenlytic, growing the business, working with customers, exploring the problem space, and kind of understanding the art of the possible versus what is actually never gonna happen? What are some of the most interesting or unexpected or challenging lessons that you've learned personally?
[00:40:59] Unknown:
I would say it's just it's a slog to build a BI tool. There's so many small features, permissioning, like, all the roles, and just there's so many features in there to build. So there's
[00:41:13] Unknown:
it's a lot. It's a it's a slog. There's there's, like, an old joke in VC where it's, like, don't build an ERP tool. It's, like, building, like, a second railroad next to an existing road. Like, it's it's it's there's so much that goes into it, and I I would actually extend BI to that category as well. So some days I get I get jealous of people who are just building product apps where they have, like, a single structural structure table that they're kind of, you know, reading and writing to. And the BI tool is is a whole other beast, for sure in that regard. And there's there's a lot of different edge cases and there's a lot of considerations, but and they're all very, very closely related. You know? So, like, again, same thing. You wanna change a feature for reasons of interface design, that's good. But then also, you know, how is that gonna impact governance? How does that impact composability of the semantic layer? Like, this everything sort of links together.
It's a challenging it's a challenging build, but it's it's really, really rewarding when it comes together.
[00:42:03] Unknown:
And for people who are looking for that holy grail of self serve business intelligence, what are the cases where Zenlytics is the wrong choice?
[00:42:12] Unknown:
So I'd say, like, the first 1 is if you're, you know, fine with the off the shelf tools. If those get the job done, if you're not, you know, feeling that much pain from this. So for instance, if you have a Shopify dashboard and Google Analytics, if that's getting the job done for you or close enough to getting the job done, then there's no reason to pay for the extra tools. There there there's no reason to to use us basically. And then the other 1 the other 1 that I'd add to is if you're, writing Python, if you're training in all models, if you're, you know, using pandas for for this kind of stuff. We're also not the tool for that. You can use a notebook like a just Jupyter or hex or, 1 of the 1 of the notebook solutions out there.
[00:42:52] Unknown:
As you are continuing to iterate on and grow and evolve Zen Lytic, what are some of the things you have planned for the near to medium term or kind of particular projects that you're excited to dig into or evolutions in the
[00:43:06] Unknown:
LLM and semantic space that you're keeping an eye on? So I think 1 of the most exciting ones for me is, more and more tooling around making management of the semantic layer just super easy. I think that's 1 of the 1 of the biggest kinda gaps now, and that's that's 1 thing that we'll be pushing a lot of features in and 1 thing I'm super excited about. When I have chats with with other data practitioners, I often find myself going back to what I say is, like, the biggest problem,
[00:43:33] Unknown:
in the modern data stack today, which is it's it's legitimately, you know, more work to maintain and build these tools. Right? And, like, it it you know, the the rise of analytics engineering as a profession is is because of that added complexity of the tools. And there's tremendous benefits. Right? So, like, of course, you get powerful customization, flexibility, you know, you get everything just exactly the way the business needs it, but there are also trade offs. So, like and with those trade offs, there's there's actually rationales for, like, those off the shelf tools that Paul alluded to. Right? So, like, there is a time and place for those. There you know, the the the modern data stack does not dominate those because those are deployable in a few clicks, and you can you can have, you know, decent analytics, with less less investments. So, like, I I I have a hunch that, you know, over the past few years, we've built up all this incredible power and complexity, and I think that the next step for the modern data stack is we're gonna start to, you know, invert that pyramid and find ways to streamline this and make it easier to maintain, easier to deploy, easier to build. And I I'm guessing I don't exactly know what that is if it's more use of templates. I think it's deploying, you know, better better use of AI to automate what's possible to automate, with the construction of, you know, a data configuration.
I think there's lots of ways we can go with it, but, I think that going forward, this is kind of, you know, the watershed moment is when we increase the accessibility of these tools as well.
[00:44:56] Unknown:
Are there any other aspects of the work that you're doing as analytic, this overall space of self-service business intelligence, the impact and opportunities for semantic layer improvements, the use of large language models, and kind of AI more generally in the path of data generation and consumption that we didn't discuss yet that you'd like to cover before we close out the show? I think it's pretty good. The only the only thing I'd say is that, you know, it's really the intersection
[00:45:23] Unknown:
of LLMs and the semantic layer that make that make this possible. That there's just you know, it doesn't get done with text to SQL. The hallucination problem is just too too much. And even if it's not hallucinating, these definitions vary from company to company. And then the semantic layer by itself is just too complicated, for for end user, like, business users to navigate. You really need the combination of LLMs and the semantic layer to make self serve true possible.
[00:45:50] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspectives on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:46:05] Unknown:
I would I would wholeheartedly agree with, what Ryan was saying about the, just how complicated it is to manage a semantic layer. So I think we're still, still in the early innings of finding out what good looks like there, but that's that's I completely agree with Ryan. It's 1 of the biggest problems,
[00:46:23] Unknown:
overall for managing data. Yeah. I'll revisit that and just like yeah. I I think that that's I I call that the most important problem in a minute attack. I I think there's a lot of really talented people who are working to address that right now as well, so I think that'll be quickly solved. I'll add I'll add, the second thing which I think is important is is going back to usability and self serve. You know, we all know liquor coins or is it like the data bread lines as a term. That's that's still a very real pain. We know that end users want to be more data driven in the way that they sort of think and act. I feel like they have not been given due consideration in the modern data stack yet. I think a lot of tools have been built by data people for data people whether it's, you know and that that's also why we have also some observability tools. You know, that's that's a very important problem, but for some reason, we haven't really focused on the needs of how people are gonna use this data to improve the way that they do through drops. So I think addressing stuff is important. I think right now if you're a nontechnical end user, your your choices are to spend a bunch of time, you know, grogging around, in in a spreadsheet, which is time consuming and very, very brittle and very, very consistent or to ask a data team, which is quite often, you know, a bunch of iterations that take so long that your question is a rearview mirror and not really actionable.
Or the third situation, which is probably the most common 1 is you just don't use data. You use your guide. You go finger in the air and you say, yeah. I think it's probably this.
[00:47:55] Unknown:
And I I think that if you're doing that, you're missing out on tremendous opportunities to to be better at your job. So that's the second most important problem is is making sure that self serve users or the end user is actually getting access to the data and analytics that they need. Well, thank you both very much for taking the time today to join me and share the work that you're doing at Zenlytics. It's definitely a very interesting product, interesting problem space. It's great to see some of the ways that AI is kind of coming full circle where it used to be. It was just the thing that you did at the end, and now it's actually part of the beginning work. So it's definitely exciting to see kind of how that evolves. So I appreciate the time and energy you're putting into that, and I hope you enjoy the rest of your day. Awesome. Thanks for that.
[00:48:38] Unknown:
Thank you for listening. Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the machine learning podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction to Zenlytic and Guests
Founders' Backgrounds and Meeting
Genesis of Zenlytic
Challenges in Business Intelligence Adoption
Identifying and Solving BI Problems
Conversational AI in BI Tools
Semantic Layer and AI Integration
Evolution of Semantic Layers
Impact of Large Language Models
Onboarding and User Workflows
User Interface Design Challenges
Unexpected Uses and Edge Cases
Building a BI Tool: Challenges and Lessons
When Zenlytic Is Not the Right Choice
Future Plans and Exciting Projects
Closing Thoughts and Contact Information