Summary
Business intelligence has gone through many generational shifts, but each generation has largely maintained the same workflow. Data analysts create reports that are used by the business to understand and direct the business, but the process is very labor and time intensive. The team at Omni have taken a new approach by automatically building models based on the queries that are executed. In this episode Chris Merrick shares how they manage integration and automation around the modeling layer and how it improves the organizational experience of business intelligence.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Truly leveraging and benefiting from streaming data is hard - the data stack is costly, difficult to use and still has limitations. Materialize breaks down those barriers with a true cloud-native streaming database - not simply a database that connects to streaming systems. With a PostgreSQL-compatible interface, you can now work with real-time data using ANSI SQL including the ability to perform multi-way complex joins, which support stream-to-stream, stream-to-table, table-to-table, and more, all in standard SQL. Go to dataengineeringpodcast.com/materialize today and sign up for early access to get started. If you like what you see and want to help make it better, they're hiring across all functions!
- Your host is Tobias Macey and today I'm interviewing Chris Merrick about the Omni Analytics platform and how they are adding automatic data modeling to your business intelligence
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Omni Analytics is and the story behind it?
- What are the core goals that you are trying to achieve with building Omni?
Business intelligence has gone through many evolutions. What are the unique capabilities that Omni Analytics offers over other players in the market?
- What are the technical and organizational anti-patterns that typically grow up around BI systems?
What are the elements that contribute to BI being such a difficult product to use effectively in an organization?
Can you describe how you have implemented the Omni platform?
- How have the design/scope/goals of the product changed since you first started working on it?
What does the workflow for a team using Omni look like?
What are some of the developments in the broader ecosystem that have made your work possible?
What are some of the positive and negative inspirations that you have drawn from the experience that you and your team-mates have gained in previous businesses?
What are the most interesting, innovative, or unexpected ways that you have seen Omni used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Omni?
When is Omni the wrong choice?
What do you have planned for the future of Omni?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
- Omni Analytics
- Stitch
- RJ Metrics
- Looker
- Singer
- dbt
- Teradata
- Fivetran
- Apache Arrow
- DuckDB
- BigQuery
- Snowflake
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Materialize: ![Materialize](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/NuMEahiy.png) Looking for the simplest way to get the freshest data possible to your teams? Because let's face it: if real-time were easy, everyone would be using it. Look no further than Materialize, the streaming database you already know how to use. Materialize’s PostgreSQL-compatible interface lets users leverage the tools they already use, with unsurpassed simplicity enabled by full ANSI SQL support. Delivered as a single platform with the separation of storage and compute, strict-serializability, active replication, horizontal scalability and workload isolation — Materialize is now the fastest way to build products with streaming data, drastically reducing the time, expertise, cost and maintenance traditionally associated with implementation of real-time features. Sign up now for early access to Materialize and get started with the power of streaming data with the same simplicity and low implementation cost as batch cloud data warehouses. Go to [materialize.com](https://materialize.com/register/?utm_source=depodcast&utm_medium=paid&utm_campaign=early-access)
Truly leveraging and benefiting from streaming data is hard. The data stack is costly, difficult to use, and still has limitations. Materialise breaks down those barriers with a true cloud native streaming database, not simply a database that connects to streaming systems. With a Postgres compatible interface, you can now work with real time data using ANSI SQL, including the ability to perform multi way complex joins, which support stream to stream, stream to table, table to table, and more, all in standard SQL. Go to data engineering podcast.com/materialize today and sign up for early access to get started. If you like what you see and want to help make it better, they're hiring.
Your host is Tobias Macy. And today, I'm interviewing Chris Marek about the OmniAnalytics platform and how they are adding automatic data modeling to your business intelligence. So, Chris, can you start by introducing yourself?
[00:01:02] Unknown:
Sure. And thanks for having me. I'm the CTO of Omni and cofounder. And, I've been working in the data management and analytics space for over 10 years now at various sort of types of products like the ITools and most recently at a company called Stitch that does data integration.
[00:01:22] Unknown:
And do you remember how you first got started working in data? Yeah. It's a, so I I kinda stumbled into it.
[00:01:29] Unknown:
Friends of friends who were starting a BI company called RJ Metrics, and this was back in, like, 2, 009 or 2, 010. And I was kind of on the hunt for a new job for no particular reason and, decided to check out what they were doing. I was really sort of intrigued by, like, the breadth of the problem and the creative sort of the open space to solve problems creatively. And, yeah, I was really excited about joining a startup because at the time, RJ Metrics was, just the 2 cofounders. So I I came on as the first engineer on the team. And so I spent, oh, about 6 or 7 years working on the RJ metrics BI product and, you know, serving in various sort of engineering and product roles.
And, you know, became actually sort of gravitated as a company, we sort of gravitated towards the data integration side of the problem. And so what ended up happening was we sort of we tried to sort of tackle the entire problem from soup to nuts, beginning with actually doing the data integration, hosting the data warehouse, and the BI tool on top of it. And and I think in many ways, we're trying to do too many things. But what ended up happening was that the place we got the most resonance in the market was actually the data integration that we were doing. So and by that, I mean, helping users, acquire data from their databases and from, platforms like Salesforce and, Stripe and Marketo and sort of, you know, the whole laundry list of SaaS tools and getting it into a queryable place for them, which, you know, in our case was our own data warehouse, but in many cases was their data warehouse. And eventually, that's kind of where Stitch came from was that as a tool, the the part of RJ metrics that was resonating most was that data acquisition, data integration piece. And we decided to sort of separate out that piece from the product, and that's what became Stitch. And so I kinda got the benefit of going, you know, sort of from the BI space into this more focused data integration space with Stitch.
And, of course, you know, Stitch sort of rode the wave of cloud data warehouses, and that allowed us to focus very heavily on just moving data from various source platforms into cloud data warehouses, predominantly, you know, Redshift, BigQuery, Snowflake. And, not really worrying too much about sort of the open ended, transformation side of the problem and sort of leaving that to the downstream tools. And I think a lot of ways that kind of was part of I wouldn't say we, started the the sort of modern data stack way of looking at the problem, but I think we were sort of in involved in that moment in time, right, where it sort of went from ETL to ELT or at least ETLT depending on who you talk to. And, we were sort of very much part of that movement and sort of, I I think, helped to helped a lot of people kinda move in that direction.
[00:04:29] Unknown:
And as far as the omnianalytics platform that you're building now, I'm wondering if you can give a bit of an overview about what it is and some of the story behind how it got started and why this particular area of focus is where you wanted to spend your time and energy now.
[00:04:44] Unknown:
The origin story is, myself and my 2 cofounders, you know, 1st and foremost are friends, and kinda all worked on different parts of the data stack over the past 10 years or so. So, like I said, I sort of went from business intelligence to data integration. My cofounders, Colin and Jamie, actually worked at Looker for most of the past 10 years in sort of various product leadership roles. And, you know, they were sort of very much in the BI side of the the problem. Through through a series of conversations over the years, we sort of all kind of rallied around this idea that, you know, despite all the great advancements in cloud data warehouses, in, you know, tons of it really awesome, you know, tools in this sort of modern data stack toolkit.
It was still really hard to work with. Right? And and while maybe some users got some better tools in their toolkit to deal with it, there's still a lot of opportunity left on the table. And you like I said earlier that I think I I personally had some desire to sort of go back to BI, which is kinda where I started with our Jmetrix. Revisit the problem a little bit with sort of the benefit of more experience. And and also, you know, I I think having been at Stitch for a while, we were sort of in the, like, we're kinda the, you know, middleware behind the scenes data pipes. And I'm I I was personally really excited to sort of get back closer to the end user, and the point of data consumption, just because it it always felt like there was a lot of leverage to solve real problems there. Right? Right? At Stitch, I always wanted to help people understand how fresh their data was, because we knew that really well. But we wanna open our tool to go do it. Right? They wanted to sort of see that within the dashboard that they were building. So that experience made me really excited to kinda get back closer to the end user. And like I said, so, you know, more generally, we just felt like there was a lot of opportunity left on the table to make data easier to use. And, you know, I think specifically where we felt like there was a good opportunity was in the world of data modeling. And, you know, in addition to building Stitch and RJ metrics, of course, of that experience, I was actually able to both sort of create and participate in couple of open source projects.
1 was singer, which was sort of like the data integration framework that Stitch used to power its integrations, and the other was dbt. So I sort of worked with the founders of dbt Labs. They were all, employees at rjmetrix, and I was actually part of the team that sort of built the first version of dbt and have since been a avid user of dbt. And so I sort of through that experience, was trained to sort of think about data modeling. Right? And, of course, my cofounders at Looker, very much the same thing. Right? Different different tool used to do the data modeling with Looker. But fundamentally kinda getting at the same problem, which is that data modeling gives you a lot of leverage. Right? It gives you the leverage to ensure your data is accurate. It gives you the leverage to curate data that maybe a less technical user could then go and use, and also sort of security and control concerns. And so what we observed and and sort of what we're trying to do with Omni is sort of play into that data modeling and and help more people do it, but also recognizing that some of the patterns that we've developed and the processes that have developed around data modeling today are full of friction and are pretty inefficient.
And so, you know, what we see is that whether you're using Looker or DBT or any other sort of approach to data modeling, it ends up being you sort of have the the data modelers over here on 1 team and the sort of analysts and and users of the data on on another team or elsewhere in the organization. And the hand off between the 2 can be slow, can be inefficient, can be full of waiting. And we feel that that is not always, obviously, very inefficient on its face. And sometimes that efficiency is worth it if you're trying to, you know, build a a deck for your, executive team to review and you really wanna make sure you get it right. But sometimes that's not really an appropriate trade off to make when, you know, you just wanna answer a question quickly. You know, you you can get started immediately and start answering that question by, you know, writing SQL or sort of looking at the data yourself. And so the thing yeah. And and we sort of observe this with our own experience. We saw a lot of people actually using 2 separate tools for each of those sort of modes of analytics. Using, like, a Looker or a Tableau for sort of, like, the the governed metrics and the governed data. And then using, you know, a a different tool or maybe even just like a SQL querying interface like, you know, Snowflake or BigQuery's built in interface for doing sort of more of the ad hoc exploration. But we think that that sort of 2 tool approach leaves a lot on table because, actually, we think of it as sort of more of a progression. Right? Where you sort of start ad hoc and then that sort of gradually evolves into something that is more important and the data model sort of comes out of. And so that was the the key sort of insight that led us to, starting Omni and that's sort of, like, the sort of key differentiator we think of with Omni is that we wanna be able to sort of sit in the middle and be both a sort of ad hoc querying tool and a model governed BI tool, and have fast efficient evolutionary paths to go from 1 of the end of the spectrum to the other. And an interesting aspect of this
[00:10:13] Unknown:
problem area is, 1, business intelligence as a product category is very long in story. It has gone through a lot of generational shifts. But also, there is a lot of debate and conversation happening right now about this modeling and kind of semantics layer of the analytics and business intelligence experience and whether that actually belongs in the business intelligence tool or as a layer or in the data warehouse. And everybody has a slightly different take on how they want to see that manifest. And I'm wondering what your perspective is on that conversation and some of the ways that you're thinking about being able to take some of the automated modeling that you're building in Omni and not have that be locked into Omni exclusively where that has been a big pain point for business intelligence systems to date where there is a lot of this domain and semantic modeling happening in in business intelligence, but then it has to be rebuilt somewhere else because you also need those same models and, you know, your machine learning system.
[00:11:13] Unknown:
Yeah. Absolutely. I I would say we have sort of, like, a big tent view of this particular topic, and and the sort of more nuanced explanation there is that we sort of view it as, you know, whether you're talking about sort of using dbt to materialize your models in the warehouse or using a semantic layer to, you know, sit on top of those models and expose them in a different way or, you know, using modeling system built into the BI tool. We think all of those approaches, you know, a combination of them is appropriate depending on the use case. And we actually just want to provide sort of seamless paths between them. Right? And so, you know, the the way that Omni works is that, you know, you can sort of start querying immediately and start and, you know, from your queries, and this is sort of 1 of the beautiful things about SQL as an abstraction is that, like, there's really no distinction between a query and a model. Right? It's just a query.
And, so what you can do in Omni is sort of promote query that you have written to be part of, sort of, your shared data model that can be reused by yourself and other people within your organization. But actually, you can go a step further than that too, and you can promote it from, you know, from Omni into your dbt repository. Right? And so we sort of view it as, you know, we think dbt is great, but we also think that if to answer a quick ad hoc question, having to go build a dbt model before you can answer the question is an inefficient workflow.
And so our our view of the world is those models become more critical and mature. And, you know, there's certainly also, like, you know, optimization considerations here. Right? If you wanna materialize the the data into the table for the sake of speed and efficiency, then DBT is a great tool to use for that. And we're gonna try to provide you sort of a good workflow, and user experience to promote, you know, from the BI tool into that dbt layer if if that's what you're using. Yeah. And then I think the other side of this is, you know, long term, and this is not something we have today, but longer term. You know, we would also, you know we we wanna build our platform in a very sort of open way because we do think the ecosystem of data tools today is great, and we we want to allow people to leverage those tools. So, you know, we want to build in such a way that you're certainly not going to be locked into, you know, just using your Omni data model with Omni, you know, charts and dashboards because there's just so many other great tools out there. Right? Some people may need a notebooking experience. Right? Other people might just wanna, you know, use their favorite query runner and integrate with the model. And so, you know, we're still kind of road mapping out exactly what form this will take. But over time, we do expect to sort of expose APIs, potentially things like JDBC and ODBC drivers, that would allow you to sort of query on top of your model, other tools, but still sort of leverage the model knowledge that Omni has.
[00:14:13] Unknown:
And given your team's background in this BI space, what are some of the technical and organizational antipatterns that you see growing up around the development and usage of BI systems and some of the ways that you're looking to counteract that with this automated modeling capability in Omni?
[00:14:34] Unknown:
Yeah. So there's a few, and I I alluded to some a minute ago in the sense of, like, we think that, you know, the idea of having to sort of go wait on somebody to build you a data model to do, like, a quick ad hoc exploration is a fairly broken process and workflow. And, you know, so I think the the most extreme version of that is the, you know, oh, I I need this additional dimension added to my, you know, to my report and having to wait in the queue for a week to do that. Right? Which, you know, probably is not a very difficult task. And so I think that's that's 1 of the sort of key that we think is broken and that we want to improve. You know, I I think another server related 1 is just, I think, data models, you know, sort of data models bring their own sort of maintenance burden. Right? And if you have a data model, I'm sure that the sort of the week you built it, it's seen and everything is accurate and not out of date. But from experience, we've seen that over time. You know, you've built a piece of the model and, maybe that ends up sort of being being superseded by a different model somewhere else, but you sort of forget that the old 1 exists, and you end up just building up a lot of cruft in your data model. And so, you know, in addition to automating and and you're sort of helping you build the model, we also wanna help you maintain it. Right? And whether that's you know, pointing out duplicative logic, pointing out unused logic, pointing out logic that, you know, maybe we've recognized has some, you know, looks erroneous or looks like it's changed dramatically.
I think that's the sort of thing that, particularly, you know, sort of over time, we hope that we can kinda build, you know, not just the helpers for creating new, but also sort of curating and maintaining that model as well.
[00:16:14] Unknown:
Another aspect of business intelligence that makes it so complex to deal with is that it is kind of the confluence point in an organization of multiple different types of stakeholders and contributors and consumers of data. And I'm wondering what you see what what are the key elements of that kind of juncture that make business intelligence such a difficult product to employ effectively?
[00:16:39] Unknown:
Yeah. And, you know, I I think a a key part of our vision is to meet the users of data where they are. And, you know, we don't just mean that for the person who purchases Omni. We mean it for every person in their organization. And the the piece that, like I mentioned earlier, we sort of recognize that, you know, particularly as more use cases like data science get in come into play, charts and dashboards are sort of not sufficient to sort of solve the entire data problem. Right? And so, the way we sort of envision this and, you know, we we sort of started with a couple of primitives within Omni, and and those primitives being, you know, charts and dashboards, and what we have, we we have topics that we introduced recently, which is sort of like a more flexible way to explore data without being completely locked into a dashboard or chart. And, you know, I think that's sort of the beginning of, a longer vision in terms of providing experiences and, you know, even just like sub products within Omni and integrations with other products outside of Omni that are targeting certain types of users and use cases. And I think that to me is a big part of how you know, I think the best part about BI is that it's I think it's purchased with a lot of sort of aspirational intent to be sort of like the system of truth and the place where we will answer all of our data questions. But as you sort of alluded to, I think reality tends to not quite reach that aspiration. And we think that 1 of the keys to actually getting there is to recognize that, you know, charts and dashboards are not enough. They're usually good for a lot of things, but, you know, they're not interactive.
They are not solving every type of data problem that exists out there. And, you know, you have 1 of my favorite examples, like, the finance team. Right? Like, the finance team's gonna live in Excel no matter what kind of tool you decide to introduce beside them. Like, they're going to live in Excel. And 1 of the things that makes me cringe is the sort of, you know, passing back, Excel sheets between, you know, drives and emails. And then, you know, not even having any any reconciliation between that Excel sheet and what's going on in the data warehouse and sort of the BI side of the world. And so, you know, 1 of our goals is to using the data model as sort of like the foundation, be able to sort of provide whether it's the integrations or the, you know, sort of first, you know, first party experiences within Omni to provide, like, the the different kinds of tools and user experiences that those different types of users need, but all backed by model that will give them some confidence that the data that they're looking at in this department is the same that the ones that, you know, in, you know, the other department are are using as well.
So I think that's part of the vision, but I completely agree that, you know, up to this point, that is not a particularly well solved problem.
[00:19:34] Unknown:
Another complaint that I've seen levied against BI systems frequently is that to your point of charts and dashboards, it's all well and good to be able to see, okay, this is what this chart is showing, but what does that actually mean, or what is some concrete action that I can take as a result of this? And then having to actually say, okay. Well, here's an answer that I'm getting here. Now I have to go into some other system. Remember what I've learned from the BI platform and then take some sort of action from that. And, you know, that that's where you've seen things like, reverse ETL be some form of answer to that, or there's some business intelligence systems where their focus is on being able to say, okay. Well, based on this, we think you should take this action. Here's a button to do that or being able to, you know, have that be a place where you can ask questions of other people. And I'm wondering if you can just, talk to the the overall scope of Omni and what it is that you're specifically trying to achieve in terms of, you know, is it, just analytics or predictive analytics or prescriptive analytics or actionable analytics or whatever kind of moniker you wanna put on it? Yeah. And, you know, I mean, I think,
[00:20:40] Unknown:
not to to downplay the question, but I do think a lot of that is, you know, marketing around it. And but, again, the aspiration is real and good and makes sense that, like, you know, showing the data is 1 thing, but actually, you know, until you act on it, you haven't really created a whole lot of value. And the way I think about this is, you know, I think sort of like the baby step towards it is, like, just being able to embed the analytics and the the visual, you know, charting and graphing, like, where you know, into the place where it is most likely to be used and with the context. So, right, like, you know, if you're trying to help a, you know, customer success person understand how a their customer is using the product, like, you know, if they're if they live in Salesforce, like, embed that those charts into Salesforce. Right? Don't don't require them to sort of flip back and forth. So I think that's kind of, like, step 1. And then, you know, to your point, I think there's sort of, like, you know, the entire broad category of, you know, data science often boils down to, like, you know, how can you sort of separate this, like, fairly, you know, abstract signal into maybe, you know, categorization of an action that might need to take place or something like that. And, you know, I think the way we view that is, like I mentioned earlier, like, through whether it's your first party, features that we eventually build or, you know, more likely in the near term, integrations with other tools in the ecosystem. Like, we wanna, you know, we want the our platform to be open to integrate with those tools. And, you know, certainly over time, I I would expect that our, you know, our functionality will include those sorts of things. But at the end of the day, when I think about that sort of, you know, my initial vision about, like, just making data easier to work with, I I actually feel like there's a lot of sort of simple stuff that needs to be easier and better that that needs to be solved before we, you know, try to tackle some of the more advanced, like, you know, automated inference types of problems.
So, you know, absolutely part of the vision, but we're we think that, step 1 is to actually get sort of the groundwork laid and make the the basic things easier. And, you know, eventually over time, we'll we'll get to that point where, you know, some of these more advanced, you know, computer assisted action
[00:22:51] Unknown:
suggesting and taking takes place as well. So in terms of the actual platform that you're building, can you describe a bit about the implementation details and some of the ways that you're thinking about the technical and user experience design around it?
[00:23:06] Unknown:
Sure. Yeah. And so, there's a couple sort of key tenants of how we're building the product, that I think are pretty interesting. You know, the first 1 is, you know, of course, like I said, the data model is a sort of core piece of the product, very much the foundation of the product. And the way our data model works is kind of interesting in that we actually think of it as sort of a layered data model. And so the bottom layer of our data model is actually in the database. It's the schema of, you know, the database that we're we're operating on top of. And then the layer on top of that is what we call our shared model. And that's where sort of, you know, the the curated data model that might, you know, have fact tables and might have renamed fields or, you know, any number of customizations that is sort of consistent across the organization lives. And so that's sort of like your starting point for whenever you're doing analysis in in Omni. And that's sort of when I think of the model, that's actually the model that I think of. And it's sort of like it's a layer on top of that that, the database model. And then, you know, further, we actually have, you know, so the the system is actually built in such a way that the model can sort of be layered and composed sort of, you know, almost infinitely. But in sort of in in practice, the the next layer is what we call a workbook model. And so when you go to actually start doing work in Omni, you actually create a new workbook model, which ends up becoming your sandbox for that that workbook.
And you can start modifying and overriding things in the model layers beneath you, but that is just going to live in your workbook model until you decide to either promote it into 1 of those, you know, into that shared model or, you know, depending on sort of the permission model applied to your organization, maybe you sort of, like, open up essentially a pull request to to ask to to, promote it into that shared model. And so, I think this is kinda 1 of the unique things that we've done is, you know, we sort of allowed you to sandbox and kind of start from the model and then in isolation, start to modify that model and curate it to yourself or potentially just add to it, right, and start adding tables. And so, you know, this is sort of example of, like, in that sort of ad hoc, I just wanna answer a question use case. You know, you might not really be modifying the model much at all or you might be sort of you're creating some new calculations that, you know, maybe at the end of your ad hoc exploration, you decide are useless. Right? You've come to no conclusions. You're not you're not interested in this problem anymore. And if you do that, then, okay, those are just in your workbook model and will really not affect the rest of the system at all until that workbook is viewed again. And so I think that's 1 of the key parts of the system that, is giving us a lot of leverage to think of this modeling exercises evolutionary because we're sort of thinking of it as a layer cake that can then be, sort of promoted between layers when it is appropriate. You know, another piece, that is very sort of core to what we're doing, I mentioned earlier, is integrating with the ecosystem.
And so, you know, this is sort of our our first version of this is our integration with DVT, which, you know, is not only can we integrate with your DVT repository to get more sort of metadata and lineage information about your data? But we can also promote to your dbt, repository so that you can you know, for things where it's appropriate to sort of materialize them and manage them with dbt, you can do that seamlessly. So I think that idea of, you know, the product, like, a a a first class feature of the product is integration with the ecosystem like dbt.
[00:26:46] Unknown:
And as you have been building this platform and exploring the problem space, I'm wondering what are some of the ways that the goals and vision of the product and the actual implementation details have changed or evolved?
[00:26:59] Unknown:
1 thing that we're doing in in Omni that's actually pretty cool is we're not only generating SQL based on sort of, you know, your interactions with our user interface, we're actually parsing SQL as well. And so, you can actually open up Omni, open up the SQL query pane, and start typing SQL. And, you know, any SQL that's valid to run against your warehouse, we will run. We'll also attempt to parse that SQL. And, you know, our goal is to try to make it so that whether you wanna write SQL or you just wanna sort of, like, interact with our UI, the experience is equivalent.
You know, what we found is that, the the parsing piece is it it can be sort of jarring to kinda go back and forth between those 2. So, like, abstractly, we like that idea. But from, like, a user experience perspective, what we found is that you end up sort of confusing yourself if you if you actually do sort of write some SQL and then interact with our UI and then go back to writing SQL. And so we're still trying to kinda nail that user experience. And, you know, I think it may I think the original vision was to sort of make those things equivalent and sort of allow you to bounce back and forth seamlessly. I think increasingly, it feels like the the SQL parsing is maybe just going to turn into more of, like, suggestions of sort of model pieces that you might be interested in and that you can sort of keep in the model rather than sort of a full 1 to 1 equipments. But that that's 1 example where, you know, we're sort of trying to feel out exactly what the user wants. And I think similarly, you know, we're we're all, like, fairly fairly technical either in the engineering sense or the analytics sense. A lot a lot of our organization has done 1 of those jobs or both for a long time. And I think the thing that I've noticed is that you, we tend to sort of build the power version of the tool, and then take a step back and say, alright. So how do we reduce and simplify this so that it is actually understandable to anyone and or or, you know, at least the part of the audience that we think is most relevant. Right? And and I think that's sort of been that that I think that has ended up being our process is that, you know, we sort of build in a lot of functionality and a lot of capability, and then we sort of need to take an refinement pass over user experience and say, alright. How do we actually make this understandable, for somebody who who didn't build it? And I think that's actually been a good process because it sort of it's like a, you sort of have this very creation oriented mindset in the first phase, and then you sort of switch this sort of, like, refinement and reduction mindset to try to, like, really nail it in the second phase.
[00:29:35] Unknown:
And digging into the workflow for a team using Omni, I'm wondering if you can talk through that. And in particular, I'm curious about some of the ways that you, address the tendency for multiple, very close, but slightly disparate models to grow up and some of the challenges of kind of the master data management question that comes up when you are trying to build these semantic models in a BI context?
[00:30:00] Unknown:
Yeah. So in terms of the first part of the question, the the ideal workflow is, you know, know, roughly what I described earlier in the sense of, like, we we think the best way to start is to just start answering questions and querying. And, you know, I I think explicitly, you know, if somebody wants to start by building their data model, then sure. Great. More power to them. And, you know, you can not only do that interactively within the tool, but you can also do it, you know, programmatically using our sort of, you know, syntax and IDE that, you know, you can save in version control and so forth. So if you kinda yeah. If you're migrating from another data model or something like that, that's fine. But, you know, I think for anybody who's sort of starting from scratch, our view is just start answering questions, just start querying, and that will sort of dictate what your data model is. And you can then as you start answering related questions, you'll it will become clear which parts of that data model are in fact reusable, and you can promote those into the the shared model and sort of continue from there.
And so that is kind of the abstract way we we like to think of people using the product is to sort of just start by answering questions and then kinda come back and refine your model after you've answered the question or as you're answering it. And what was the second part of the question? Sorry. Thinking about how to manage some of the tendency for there to be a sprawl of slightly different perspectives on what
[00:31:25] Unknown:
a, semantic model is supposed to represent.
[00:31:28] Unknown:
Yes. Right. So you I mean, exactly. Great point and right. We think that model maintenance problem is just as much of of the problem as the model creation problem. And, you know, right now, the way we view that is that we are we sort of we're showing essentially, like, a unified well, so there's a couple ways. Right? There's there are sort of, permissioning systems that you can apply in Omni that it would allow you to say, like, okay. Only these people can modify the shared model. So, you know, anyone else or and and perhaps even including these people should be going through sort of, like, a a poll request or merge request or or or sort of change management type of process to change that model. And so that'll enable sort of a human to kinda do that curation and deduplication and and maintenance as it happens. But I I think that's not sufficient. And the other way we're solve solving this problem is by sort of showing a history of the model. And you I think increasingly the plan is to sort of apply some intelligence on top of that history to flag things that may be worthy of, further further investigation.
Like, hey, this logic is the same as this logic. And, this this part, you know, this part of a model has not been used for this many months. Should you consider removing it? That sort of thing. So we kind of view that second, like, the the maintenance problem as 1 that's sort of a combination of, like, giving users a good view into the changes that are happening. And then also, algorithmically helping them identify where there's sort of lurking demons in terms of maintenance.
[00:33:09] Unknown:
The other interesting aspect of building a BI platform, especially today given the number of options that are out there, is how the evolution of the surrounding ecosystem has influenced the realm of the possible And what are the kind of shifts in the surrounding technical capabilities that led to your decision that now is the right time to, you know, embark on this new vision of what BI should be?
[00:33:39] Unknown:
Yeah. I mean, it's almost not new anymore, but I think we shouldn't idea the idea of having a data warehouse was limited to, you know, the top very small percentage of companies who could afford Teradata racks. And now it is accessible to, you know, any company and at a very affordable starting price. And so I think that and, you know, combined with that are the tools like Stitch and, of course, like Fivetran and some of the other data integration tools out there that are helping you actually get the data into the warehouse quickly and easily and affordably. And so I think that that those have been a huge enabler.
And, you know, I think a lot of the most recent sort of tool BI tools have certainly adapted to take advantage of that, but not too many of them were actually architected from the ground up to to work exclusively on data cloud data warehouses. So I think that that's you know, we shouldn't discount how big that wave is and how important it is, even if it feels like it was ages ago. The other piece that I I think is very, relevant to us and, you know, something that we're trying to do is, you know, in addition to, making it fast to model, we just think that the speed of querying is an enormous enabler of being able to analyze and understand data quickly.
And, you know, so everything we are building has sort of speed and performance as top priority. And, you know, 1 of the things that we're we're doing is feels like it's it's a very popular thing to do right now is, you know, we're, we're using Apache Arrow as sort of our data on the wire and actually on disk format to reduce sort of the cost of serialization and deserialization everywhere. And we're also, trying we're we're caching data in both our server tier and the browser in such a way that it can be re queried using duckdb so that as much as possible, we can actually answer questions without having to go back to the server. Right? Whether that's our server or the database server.
And so, you know, the goals there being both really fast and hopefully save you a few bucks on, you know, that cloud data warehouse bill if it's usage based. So those I I would say that, you know, the our our ability to do that has been greatly accelerated by technologies like Aero and DuckDV.
[00:36:21] Unknown:
And in terms of the ways that you're thinking about approaching the problem and the product that you're building, what are some of the positive and negative sources of inspiration that you've looked to?
[00:36:30] Unknown:
Yeah. I mean, I I obviously, our own experience has, is probably the biggest influence, in terms of, inspiration. And, you know, like I said, I I think the experience of having, you know, both built and used modeling centric products like DBT and Looker has sort of informed a fairly nuanced view about the role of modeling. And while it has not, you know, turned us away from modeling, it sort of made us recognize that, you know, modeling is a good thing, but it is a tool to be wielded sort of thoughtfully and carefully and not sort of used as an answer to every single problem and every single sort of use case and and workflow.
So I think that's that's been a huge influence on how we've thought about the problem. I'm trying to think of other things that come to mind there. I I would say another another 1 is, you know, even though we even though we're trying to sort of evolve the way people think about this sort of data modeling workflow and to maybe make them think outside of that box of, like, hey, I need to go code up my dbt model or code up my LookML model. We do also really firmly believe in the idea that having a code representation of the model and sort of the software engineering workflow around it is a power tool. It's a power tool for the power user, but it is a a necessary component. So, you know, we're not trying to you know, in terms of sort of positive influences, like, we we do very much view that as a key piece of the solution here. It's just that, you know, we're also trying to sort of build a more interactive experience beside it so that other users can use that sort of less software engineering style of workflow when it's appropriate.
[00:38:16] Unknown:
And in your experience of building Omni and working with some of your customers on applying it to their specific problems, what are some of the most interesting or innovative or unexpected ways that you've seen it used?
[00:38:28] Unknown:
Yeah. A cool use case that has come up that it's actually 1 that I've sort of always been intrigued to try, and we're actually seeing it with Omni, is actually sort of using a sort of data warehouse, data vis BI approach to analyzing sort of operational server logs. And, you know, I I personally have cringed at, like, my elastic search bill from, running, you know, cert ingesting all my logs into elastic search and, you know, similar other platforms in the past. And at the same time, always, like, net excuse me, never felt like I understood those querying languages quite as well as I knew SQL.
And so, I think particularly with platforms like BigQuery and Snowflake where you can sort of keep data sort of in cold storage for for very cheap. This use case has always intrigued me, and so we're actually seeing, some some use of this idea where the data so the the server logs are being formatted into Snowflake and then being queried using Omni. And, actually, Omni is great for this use case because we have a lot of helper functionality for, like, breaking apart JSON and exploring it where, you know, sometimes that syntax gets a little gnarly and and difficult, in raw SQL, but we've got a lot of, like, querying in Omni makes it fairly simple. And so I I feel like that is a a really cool use case that I would love to see more people use. I do think it probably ends up being cheaper. I I do think it probably ends up being cheaper. I think it's a little too early to tell, but I think it probably ends up being cheaper than running sort of massive elastic search clusters or, you know, paying for some of the, third party tools that that can get quite expensive as well.
[00:40:16] Unknown:
And in your experience of building the platform and the product, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:40:25] Unknown:
I think, you know, for me personally, having sort of worked in the BI space for a while and then gotten away from it, and it has been more sort of in the data pipe space for for the past, say, 5 or so years. I had sort of lost sight of how big and complex the visualization feature and problem it it's self is. Right? Like, you know, of course, DataViz is feature of BI and and by no means the only feature of BI, although it's a it's a very important 1. But even, you know, within visualization, there is, you know, such a rich set of research about how to visualize data, and, you know, both, you know, commercial and open source libraries that sort of help with, you know, taking structured data and mapping it into a visualization.
And then you sort of have as a BI tool, you have to solve the problem of, like, you know, not only how do I sort of present the user with a way to kind of, like, format the visualization based on their data, but, can I actually do some of that for them? And, you know, at least give them a good guess as to what they probably wanna see when they open up, their chart visual, on top of this dataset. And so, you know, I I, I think I I was quickly reminded after having been away from it for a while how deep that topic is. And, you know, we as a BI tool, we know we wanna be great at visualization, and we expect to invest heavily in it. And, shameless plug, this is why we're hiring for visualization engineers right now.
And, it's a topic that is, you know, I think both very deep and very difficult. Because also there's there's just a lot of inner the problem itself is intertwined. Right? Like, if you wanna sort of add a grouping to an existing visual, it might impact some of the other existing encoding individuals. So how do you how do you do that in such a way that the user isn't surprised by what they see? And I think that's a a fairly complex topic that, you know, I've come to appreciate how much academic research goes into. And, you know, it's something that we are, you know, already investing heavily in and expect to invest even more heavily in over time.
[00:42:33] Unknown:
And for people who are evaluating which business intelligence system to use or, you know, whether they are are already using a business intelligence system, whether it makes sense to add Omni to the stack. What are the cases where OmniAnalytics is the wrong choice?
[00:42:50] Unknown:
Yeah. I mean, I I think Omni operates on top of a warehouse. Right? So if you are you know, if your team does not have the the capability to operate a warehouse or get data into a warehouse, then, you know, Omni definitely wouldn't be a good choice. You know, the good news is I think both of those problems have become easier over time with, you know, very sort of low management and maintenance platforms like Snowflake and BigQuery in particular, and tools like Stitch and Fivetran for getting data into the the warehouse. I I think additionally, right, if you don't have anyone who's sort of data you know, when you start with Omni, you you actually have to sort of build the model and the queries yourself. Right? And over time, I I imagine we'll have some templating and so forth. But you still have to sort of play a very active role in deciding what to query and actually building that query, you know, maybe not with SQL, but at least with, you know, sort of our interactive query builder.
And, you know, if if there's if the company doesn't have that skill set and, you know, maybe is just looking for some more, like, premade analysis, then, I don't know that Omni would be the the best choice in favor of some, you know, there there's a lot of great platforms out there that are, you know, focused on so on answering questions about customer success or about, you know, product funnels, or any variety. I'm sure I'm sure there's products out there for just about every question you could imagine. And I think, you know, in some cases, those would be more appropriate than Omni.
[00:44:24] Unknown:
And as you continue to build and iterate on the on the Omni platform, what are some of the things you have planned for the near to medium term or any particular problem areas or projects that you're excited to dig into?
[00:44:35] Unknown:
Yeah. Oh, great question. So, yeah, the road map is long, and trying to think of some highlights to call out here. So I mentioned the dbt integration. Right now, we have an integration to help you generate dbt models. And the road map for that feature, which I expect will be achieved in the in the coming months here is to both help you sort of not only generate the model, but automatically sync it with your dbt repository. And also then sort of read the metadata from the dbt repository to inform Omni about lineage and other sort of metadata that we can glean from it. So, that's 1 that I'm pretty excited about.
Others, the, we are planning to invest heavily in the, what we call our own IDE experience, which is the part of our app that allows you to programmatically define the data model. And so we are right now, we have sort of a a basic text editor, but we are the near term road map gonna be sort of building auto completion and suggestions and validation into that editor and also integrating with Git so that that the code representation of the data model can be version controlled and sort of managed in that way. And and I imagine also, you know, hopefully, we will be able to build some integrations with with third party editors like Versus Code, for folks who wanna work in, you know, sort of outside of a web browser. So, I think that entire sort of universe of functionality around programmatically defining and managing the data model is a big area of focus for the coming months.
And and I think addition to that too, you know, we have dashboards that you can sort of build interactively. But we also wanna treat dashboards as sort of a code artifact that you can manage as code. I think particularly for use cases like, you know, in, embedded analytics is what we call it, which is application developers using Omni to, expose data analytics in their own apps. All that the the sort of software workflows and software engineering workflows become really important for those use cases. And actually, the the last the last feature to mention is that embedded analytics use case where, we are gonna be working on, both the permissioning model, and sort of a, embedding model to allow people to embed charts and dashboards into their own apps and also apply things like row level filtering and so forth based on user attributes to, you know, only show the appropriate things when that embedding happens.
[00:47:12] Unknown:
Are there any other aspects of what you're building at Omni or this overall problem of model generation and maintenance that we didn't discuss yet that you'd like to cover before we close out the show?
[00:47:23] Unknown:
I think we covered a lot of it. You the questions were great, and I think that, you hit you hit the the creation points, the maintenance points,
[00:47:30] Unknown:
the sort of assisting users. So I think we hit a lot of it. Thank you. Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:47:49] Unknown:
Good question. So I go back sort of to that abstract idea that, like, data is still too hard to work with. And and I think what that means for me is, I I think the way the way I would characterize the gap is, you know, we kinda have the, like, the pre canned report on the sort of 1 end of the spectrum. And, you know, then we have sort of, like, full blown querying on the other end of the spectrum. And we've been thoughtful enough about sort of all the different sort of types of users and use cases and experience that sort of fill in between those 2 ends.
And, you know, I think that's where when I talk about, you know, data still being too hard to work with, I don't think it's some kind of, like, magical assistive AI that's going to make it easier to work with. I think it's, you know, going back to that phrase meeting the users where they are. Like, I think it's really being thoughtful about different use cases and being able to deliver sort of high quality, you know, quote, unquote, good data to that type of experience, that is appropriate for the user. Whether it's a, you know, a charter or dashboard sort of on the reporting end of the spectrum, or a, you know, a spreadsheet, or a notebook, or, you know, maybe sort of a data application, or, you know, sort of a a custom built app in in sort of some other cases. So I think that's where, I I see lots of opportunity because, yeah, I think data management is only as good as it helps users actually use data, use it correctly, and get insights and act you know, actionable insights out of it. And I think that's where we're falling down is that for most people, the the types of data experiences we're trying to give them aren't really that useful.
[00:49:36] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share the work that you and your team are doing at OmniAnalytics. It's definitely a very interesting product, and it's great to see this challenge of modeling built into some of the core workflows of these BI systems. So I appreciate all of the time and energy that you folks are putting into that, and I hope you enjoy the rest of your day. Thanks, Tobias. You too. Bye.
[00:50:03] Unknown:
Thank you for listening. Don't forget to check out our other shows, podcast.init, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com. Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com with your story.
And to help other people find the show, please leave a review on Apple Podcasts, and tell your friends and coworkers.
Introduction to Chris Marek and OmniAnalytics
Chris Marek's Journey in Data Management
The Origin and Vision of OmniAnalytics
Challenges in Business Intelligence and Data Modeling
Technical Implementation of OmniAnalytics
Evolution of BI Tools and Ecosystem Influence
Lessons Learned and Visualization Challenges
Future Roadmap and Features of OmniAnalytics