Summary
Business intelligence is the foremost application of data in organizations of all sizes. The typical conception of how it is accessed is through a web or desktop application running on a powerful laptop. Zing Data is building a mobile native platform for business intelligence. This opens the door for busy employees to access and analyze their company information away from their desk, but it has the more powerful effect of bringing first-class support to companies operating in mobile-first economies. In this episode Sabin Thomas shares his experiences building the platform and the interesting ways that it is being used.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos.
- Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to dataengineeringpodcast.com/hevodata and sign up for a free 14-day trial that also comes with 24×7 support.
- Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end-to-end Data Observability Platform! Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, Airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes. Monte Carlo also gives you a holistic picture of data health with automatic, end-to-end lineage from ingestion to the BI layer directly out of the box. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo to learn more.
- Your host is Tobias Macey and today I’m interviewing Sabin Thomas about Zing Data, a mobile-friendly business intelligence platform
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Zing Data is and the story behind it?
- Why is mobile access to a business intelligence system important?
- What does it mean for a business intelligence system to be mobile friendly? (e.g. just looking at charts vs. creating reports, etc.)
- What are the interaction patterns that don’t translate well to mobile from web or desktop BI systems?
- What are the new interaction patterns that are enabled by the mobile experience?
- What are the capabilities that a native app can provide which would be clunky or impossible as a web app on a mobile device?
- Who are the personas that benefit from a product like Zing Data?
- Can you describe how the platform (backend and app) are implemented?
- How have the design and goals of the system changed/evolved since you started working on it?
- Can you describe a typical workflow for a team that uses Zing?
- Is it typically the sole/primary BI system, or is it more of an augmentation?
- What are the most interesting, innovative, or unexpected ways that you have seen Zing used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Zing?
- When is Zing the wrong choice?
- What do you have planned for the future of Zing Data?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan's active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans could focus on delivering real value. Go to data engineering podcast.com/atlan today. That's a t l a n, to learn more about how Atlas Active Metadata platform is helping pioneering data teams like Postman, Plaid, WeWork, and Unilever achieve extraordinary things with metadata.
When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With their new managed database service, you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs. Go to data engineering podcast.com/ lunote today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy. And today, I'm interviewing Sabine Thomas about Xyng Data, a mobile friendly business intelligence platform. So, Sabine, can you start by introducing yourself?
[00:01:37] Unknown:
Hey, everyone. It's a very long time listener. My name is Sabine Thomas. I am the CTO and cofounder of Xyng Data.
[00:01:44] Unknown:
And do you remember how you first got started working in data?
[00:01:47] Unknown:
Yeah. Absolutely. I think not to give away my age, but 1 of my first jobs out of college was working on a reporting system for an HR data software company and remember diving into SQL dialects. We were using an early version of SQL Server back then, but a lot of it was being able to construct the right pivot tables to be able to get at the data fast. So that was my first foray into it and to being able to work with data and have loved it since. Fast forward a couple of years, and I find myself this is early 2010. I found myself working at Rakuten where I was the director of engineering there. At Rakuten, you know, fairly big in the ecommerce space. And so my task there for the teams that I was managing was actually working on their data infrastructure for payment data.
And this is around the time for Hadoop and Apache Spark wasn't even 1.0 at that time. And so it was a great interest space to be in to be able to process these large volumes of data, seeing how the shift from Hadoop to streaming data was was making its way. And so, yeah, so that's sort of my forays into data now with Xyng Data.
[00:02:54] Unknown:
So in terms of the Xyng Data project, can you describe a bit about what it is that you're building there and some of the story behind how it got started, and why you decided that this was where you wanted to spend your time and energy?
[00:03:05] Unknown:
So Zing Data, at its core, is simple social business intelligence for everybody. And we say that because we ourselves ran into that problem set, where I, as in engineering and engineering leadership and in product management, never fully had access to data at my fingertips. There was always something that I had to go pull up on my laptop or my desktop to be able to query stuff. And so in the middle of a meeting, in the middle of some active, you know, product discussions, not being able to see what the analytics look like, user data look like, or product usage data look like seemed to be a problem. And so having faced this ourselves, we said there's gotta be a better way to access it. That's the problem statement we started out with. And, you know, to prove that out, we built the ZingData, an early version of it, mobile focus to be able to have everyone access to to data at their fingertips.
And a good number of engineering challenges that came up as a result of it, and we can dive into that. But just as a use case, we found that extremely useful for ourselves. And then when we put the product out there, we were quite surprised by the the types of users signing up. People who would traditionally would never have had access to a data engineering team or a partner or data visualization software. We were seeing people like truck drivers signing up, inventory managers at warehouses signing up. Again, traditionally, non data use cases that had organically signed up and just told us the problem set is actually a lot bigger than we thought.
[00:04:39] Unknown:
With this idea of mobile access to business intelligence, On the 1 hand, you can say, Well, I've got that with any web based business intelligence platform because I can load the site and I can look at the charts. And I'm wondering if you can provide some nuance and distinction to what it means for business intelligence to be used in a mobile context.
[00:05:00] Unknown:
Yeah. Absolutely. So the premise here is, you know, I think with the result of the pandemic, everybody's been displaced in some fashion. And surprisingly, we've seen people have a good amount of their work done, not just at their laptop, but on their phone. If you're using Slack, there's a huge percentage of people that are just using it directly to get work done on their phone. So we're looking at 76% of monthly active users are people using Slack on their phone. Same way with Google Docs, GSeeps. So there's a good slot of productivity use cases that can be done mobile first. But the challenge there is in the context like you mentioned, existing BI software, the interactivity of having a display on a mobile browser and then having to pinch and try to be able to get into the zoom of the actual data is quite problematic. And that itself is a use case scenario I personally have have dealt with.
Beyond that, there's a whole engineering back end architecture difference as well in how you represent that's made to be mobile first. Our data is ubiquitous. We have a mobile app, we have web apps as well. So we've seen how you can represent data differently, but in the way we've architected it, the architecture to be able to represent on mobile first is quite different. Things you have to take into account are like, small form factor. If you're looking at a dataset on your phone, there is no reason for you to pull down 40, 000 rows from a SQL query result. That just doesn't make sense, and that doesn't help you do further data analysis. There's also interactivity. You know, I mentioned the pinching and heavy to, like, squint into the results to be able to get that. You know, interactivity with mobile apps is almost second nature at this point. You know when you open up an Uber to, like, tap to be able to swipe to, like, a long press to be able to get to certain actions. And these are actually the basics of how we've designed our app as well. So there's, like, an interactivity difference there.
And then I think the last thing here is sort of like the query composition piece. When you're looking at your results, if you're trying to pull up sales data by region, we know that you're on your phone, you're in a particular location, you have particular connectivity, 3 gs or 5 gs, whatever it is, you are backgrounding the app to go into another app and then come back to this. So the mobile architecture to be able to handle all that is pretty complex, and that's we've taken a long hard look at being able to design this right from the ground up. And I think we've done that very well with ZincData.
[00:07:28] Unknown:
To your point about that query composition, having been somebody who has used my phone for a number of things that most people don't bother, like SSH connections, etcetera, I don't imagine that sitting there and writing long or complex SQL queries is something that I'm really gonna wanna do on a mobile device. And so I'm curious if you can talk to some of the kind of real world usage of what types of queries people are actually wanting to write or generate from a mobile device, the types of questions that they're trying to answer in that mobile context versus a more full fledged data exploration that they might be doing on their laptop, and some of the ways that you think about how to enable them to maybe answer more complex questions than what they are going to want to sit and write out all the SQL on their phone keyboard?
[00:08:14] Unknown:
Yeah. Absolutely. Writing out SQL on your mobile keyboard is generally not advisable. And so this is where we've thought of how that interactivity works. And so we actually have a low code interface where you can kinda tap into a table, long press into a column name, and select the filters you want, all of this without having to type in SQL. So there's a mechanism there that allows you to be able to get to a result pretty quickly. And on the back end, we're doing pretty intelligent query composition to say for the categories and the attributes you're trying to view in your query result, here's what the ranking in the window function looks like, and we will sample that data down for you if it ends up being in a large result set. So that's sort of like the query intelligence layer, if you will, that's happening behind behind the dataset. But then to be able to handle connections, you know, SSH connectivity, direct peerings across AWS networks, VPC peering, SSH tunneling to be able to get even more granular access. All of these things are key tenants of how you would access data normally within enterprises, even cross department. And so these are all the things that we've designed for to be able to to kind of leverage the broad access patterns.
[00:09:24] Unknown:
Another aspect of a mobile business intelligence system is that when you are mobile, there are particular ways that you want to interact with the data versus what you would want to do on a laptop or from a kind of full data exploration suite. And I'm curious if you can talk to some of the interaction patterns that have become common in these web based or desktop BI systems and where that breaks down moving to mobile? And then conversely, what are some of the interaction patterns that are unique to mobile that wouldn't make sense in a web or desktop first environment?
[00:10:03] Unknown:
Yeah. Absolutely. You're right. Thanks for clarifying that. So in the sign ups that we've seen, again, we've done very limited marketing at this point in time, so all the usage we've seen has been completely organic. And so in the sign ups that we've seen, there's this concept of a deskless workforce, which is folks that are never at a desk working. They're always on the go. And the typical persona here is, you know, your salesperson or field sales folks that's going from meeting to meeting. But then your CEO as well is doing the same thing. Every people in your c suite are also traveling between locations. And at various points in time, are trying to get access to data and are not being able to pull up a laptop to be able to get those results. Or inevitably, we'll shoulder tap somebody on the data engineering team to say, can you give me sales by region because I want it cut this way. In the usage that we've seen, the example that I mentioned earlier around warehouse for man trucking company that actually ships trucks. It's a pretty interesting use case. They have a usage pattern of being able to get access to what inventory data looks like at that point in time and as they're running from warehouse to truck and back to warehouse.
And this is not somebody who's at a laptop. This is not somebody who's at an office per se. It's probably in the truck trying to get patterns around what the warehouse inventory looks like. And so and from them is being able to make decisions around trucking patterns and, route patterns. So that's 1 use case that we've seen. Another use case that we've also come across has been, national concert operator, 1 of their biggest concert operators. We had a venue operator setting up for a pretty large concert, plug in the Xyng data app to her Google Sheets data of seating chart info. And so she's running around in this concert venue trying to organize seating categorization and is looking at Xyng app to be able to get those queries. And so this is, again, a use case of somebody not at a desk would not be able to pre render these charts by asking somebody on a data team. So these are all live ad hoc type queries that we think are best use use the best use cases for something like Xyng.
Where it kind of breaks down from your desktop to mobile, I think for us, you know, the way we see it is ubiquitous access. If you're doing something on your mobile, you should be able to pick up where you left off on your web and be able to continue that data analysis and the the exploration there. And that's what we've designed for. We keep that as a core tenant of our design principles. So you can run a query on your phone, get access to a result, be able to comment and tag somebody else on that thread saying, hey. Why are sales seemingly dropping down during this week and be able to pick that conversation in that thread and that query on web. So for us, it's more ubiquitous.
Where we see there be a difference between, you know, more desktop based querying is where somebody needs to look at maybe a very small result set and be able to cut that in 4 or 5 different ways and then be able to write write a machine learning model on top of that. And we think that's a different use case that Zoom necessarily does not solve for, nor are we interested in at this point in time, but I think that's where we see, some amount of difference.
[00:13:25] Unknown:
Are you struggling with broken pipelines, stale dashboards, missing data? If this resonates with you, you're not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end to end data observability platform. Trusted by the teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, DBT models, airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes.
Monte Carlo also gives you a holistic picture of data health with automatic end to end lineage from ingestion to the BI layer directly out of the box. Start trusting your data with Monte Carlo today. Go to dataengineeringpodcast.com/montecarlo to learn more. As far as the native app capabilities versus just designing a mobile optimized web experience, You touched on this a little bit as far as being able to understand, okay, what is their actual connection type? What are the resources available on the device? I'm wondering if you can just talk to some of the ways that you are engineering around this mobile native capability and some of the features that would be effectively impossible or at least exceedingly difficult if you were to try and stick with a web first implementation?
[00:14:54] Unknown:
1 example that always has bothered me in the past with other solutions has been a long running query. You are running something against your retro cluster or your Databricks cluster, and it just takes time to crunch and compute. And it is not feasible to have that longer response time when you're on a mobile phone. So for us, you know, a core way for us to solve that has been push notifications for long running queries. You issue a query, you background the app, go do something else, answer a couple of emails, and you would push notify on your mobile phone to say, we have your results. Also, if you are in a common thread in the results set, you can tag somebody and they get a push notification as well, and they get dropped right in those query results to be able to analyze further. So we think this is a real good example of a use case that, you know, you just couldn't do easily enough with web. There are web sockets that you could have to do a persistent connection. But in that mobile use case, this is just much more native in that scenario.
[00:15:54] Unknown:
In the architecture of what you're building at Xyng Data, I'm wondering if you can talk to the breakdown of what are the types of operations that you're doing in the mobile device, how you think about being able to manage intermediate datasets so that you don't have to constantly be streaming information back and forth between the mobile client and the back end system and just some of the engineering challenges that you've had to work through as you've gone from idea to where you are now?
[00:16:24] Unknown:
Yeah. Absolutely. And I think that's something we're still iterating on. There are some really good technologies out there that have come out in the in the most recent few weeks, I mean, Postgres, Wasm, to allow you to be able to run that completely in browser. It's been pretty interesting. There's DuckDV that gives you really good, you know, single core performance, you know, when you're running it on mobile. Those are the compute resources are obviously limited. So there's some interesting projects out there that I think we can kinda leverage on. For us right now, we focus very much on the query intelligence piece, which is how you construct that query to be able to get you the most interesting data quick and fast, and then be able to iterate on those queries. So I I think there's a future where we could kind of integrate that, where you would be actually seeing, you know, more fine grain analysis happen on your device's silicon.
But I think there's few nuts and bolts to figure out there before we can do that. But that's been a core part of our thinking, which is to be able to not have to run this on the mobile phone given it's limited resources and RAM. What's the best way to construct that query to limit that, to rank it, to window function it so that you get the most appropriate real result set. So for the user, they have this initial dataset that they can run future queries off.
[00:17:38] Unknown:
Digging a bit more into the mobile side of things, I've done a little bit of work on building for mobile app clients, but it's definitely a very large and constantly evolving ecosystem. And I'm wondering if you can talk to some of the ways that you've thought through how to implement the actual mobile app where you are likely targeting both iOS and Android. So are you doing kind of the React Native approach? Are you building native apps for both? Are you using 1 of the toolkits that allows you to cross compile for different endpoints? You know, so are you using you know, building it on Go? Are you using, you know, just the app native toolkits? And then in that context as well, how do you think about being able to abstract the shared usage that is agnostic to the runtime and being able to then add the customizations for different device capabilities.
[00:18:31] Unknown:
Yeah. I think app development frameworks have come a long way, and they continue to iterate every couple of years as seemingly a new framework. We've paid special attention to how other startups have done this. You know, Uber, when it first came out, had just an app for the iPhone and then later on added Android. I think the way we thought about it is for tooling like ours, for a product like ours, which is essentially bringing data analysis into the palm of your hands, you know, what is the thing that gives you the most global reach? And surprisingly, the United States and its mobile device usage for, you know, productivity apps is, I think, a lagging behind compared to other economies.
If you look at emerging economies, they've sort of jumped the gun completely from dial up all the way to wireless 5 g to which, you know, mobile phone is primarily your primary computing device. So we thought about that. We know that the stats between iPhone and Android usage are very widely outside the United States. And so for us, it was important to have something cross platform from day 1. Our SDK of choice has been Flutter, you know, that Google put out, I'd say, about 6 years now if I if I get my math right. But Flutter itself has been evolving pretty quickly over the past few years to the point that when we started out with Flutter, we got out of the box cross compilation for iOS and Android, but it hadn't had anything for web at that point. And that only came later after we'd had already had a few iterations of the Zynq data app. So it's pretty interesting to see the pace of innovation and what's happening, and obviously something that cannot be done, you know, just a few years ago. There have been other toolkits like Cordova that allow you to do this cross compilation, but there's always been issues with being able to get good usage around, you know, memory and GPU power.
Flutter for now is able to solve that for us. We are seeing some limitations that, you know, may cause us to pursue a more native app path. I think for us in the stages of business we're in, you know, the Shutter allows us to have sort of a a universal code base that applies to 3 interface mechanisms, allows us to get changes and iterations out there faster. That's been key for us, but I think there's a future where we may dive into more native app variants. Flutter does allow you to do plugins that you can sort of tool on top of the compilation to give you more native interfaces. And so that's something we're actively exploring right now.
[00:20:54] Unknown:
An interesting point that you brought up is the question of usage of mobile devices in developing economies and in places where computers and laptops aren't the kind of initial computing experience. And that also brings up some interesting questions about, for those types of users, what are some of the additional features that are useful or necessary beyond just being able to look at some data, maybe dig into it a little bit, because I know in the feature set that you have right now, you have the capability of being able to download a rendered version of a report or being able to download a CSV of that data. And as somebody who's living in a first world country, my first reaction is, well, why would I wanna do that on my phone?
Right. Why do I care about that? But I'm curious if you can talk to some of the ways that you're thinking about the needs and requirements for some of these other types of users who aren't necessarily in a developed economy or who don't have a laptop and a tablet and a phone and all these other different computing devices to be able to work from.
[00:21:58] Unknown:
Yeah. Absolutely. So, you know, 1 use case that we've seen, Latin American retail chain, they do a small kind of furniture supplies, not a grocery chain, but they have about 8 stores. And they are currently using us to track inventory levels. The sign up came from, like, an Android device, and that was just the primary mechanism for this 1 inventory manager that's tracking us across these 8 locations. And their mechanism of accessing data has been primarily mobile. There's been no laptop usage or no desktop usage that we can tell of. And for the feature set that we have, 1 thing that really applies to them is our ability to sort of create alerting data on time series data that you can get pushed out 5 for.
This completely absolves, you know, the need for a data engineering team or anybody to create an airflow job. You can, from your phone in about 4 or 5 taps, look at, you know, usage data, be able to sort of provide levels, and then create a push notification alert so that as that data changes, you get notified on that phone. And so this is a use case that we saw this user do and kinda surprised us to understand why they would need to track this, but it's just something that's just so native for this person to be able to manage all these locations. Not having had this access, I think they would have had to hire, you know, few more folks to be able to split that task out. But with Seeing Data, they're able to make use of this use case.
[00:23:27] Unknown:
The other side of that equation is that when you do have a cellular connection as your primary method of accessing the Internet, that adds some constraints in terms of the volume and complexity of data that you want to be working with. And I'm curious how you think about that balance on the server side and also in the user experience aspect of helping people understand how to do their subsetting, maybe how to do some of their filtering so that they're not dumping the entire table to their phone, also taking into account things like constraints of storage space and memory on these mobile devices and how you think about being able to provide an interactive experience for those users while still being able to get useful information and not be hamstrung because of that kind of limited bandwidth.
[00:24:16] Unknown:
Yeah. Absolutely. I mean, 1 thing that we kind of don't ever take for granted is good strong connectivity. So, you know, by designing for, like, limited and and spotty network from the ground up, we're able to take all of those design decisions into our back end. So 1 thing that we do when you've connected us to any of your data sources, is that we're able to compute from the data types of the tables that you're interested in looking at or the columns that this looks like time series data, and will automatically do a relative date searching for you. So this doesn't mean that you need to pull down sales data for the last 10 years if that's the only database that you have. An automatic default in that scenario is to be able to pull down last 7 days. And those are things that the user can pull again, drag and drop between. But that's an example of a smart default that we think, you know, really speaks to this use case where somebody's out in the field, spot a 3 g connection because, you know, 5 g is yet to get there, but once they get access to this 1 data. So that's an interesting use case. Other smart defaults that we think about take into account, you know, multiple fields. If we can tell you that this is, you know, time series data with monotonically increasing values on these particular attributes, that tells us that this is something that can be aggregated on. So for a data type that is, you know, your integer to begin to float, we can aggregate on that, and that becomes an a filter option that is already done for you or you can choose from. It makes no sense to show an aggregation option on the string data, which you kinda see with other BI vendors, but for us, that is a smart default. So it kinda takes all this cognitive load from the user to have to think about what is the underlying data, but be able to more think about the type of question and the answer they're trying to get to.
[00:26:05] Unknown:
Data engineers don't enjoy writing, maintaining, and modifying ETL pipelines all day every day, especially once they realize that 90% of all major data sources like Google Analytics, Salesforce, AdWords, Facebook, and spreadsheets are already available as plug and play connectors with reliable intuitive SaaS solutions. Hivo Data is a highly reliable and intuitive data pipeline platform used by data engineers from over 40 countries to set up and run low latency ELT pipelines with 0 maintenance. Posting more than a 150 out of the box connectors that can be set up in minutes, Hivo also allows you to monitor and control your pipelines. You get real time data flow visibility with fail safe mechanisms and alerts if anything breaks, preload transformations, and auto schema mapping precisely control how data lands in your destination, models and workflows to transform data for analytics and reverse ETL capability to move the transformed data back to your business software to inspire timely action.
All of this plus its transparent pricing and 247 live support makes it consistently voted by users as the leader in the data pipeline category on review platforms like g 2. Go to data engineering podcast.com/hevodata today and sign up for a free 14 day trial that also comes with 247 support. Because of the fact that you are giving some of this exploratory capability through the mobile experience, it also suggests that the people who are using it aren't necessarily going to be supported by a large team of data engineers and analysts. And I'm curious how that also influences the way that you think about the product that you're building and who you're building it for and how to
[00:27:41] Unknown:
surface some of these concerns around the vagaries of data and how to work with it. Yeah. That's exactly the point. So our research as we were building the company out, talking to numerous sets of folks, 1 thing that we commonly heard from a lot of data practitioners and people in data engineering teams of these big SaaS companies is just the amount of work that they they have to keep up with and the number of tickets that they've got to be able to respond to these report requests. And so, you know, a data team is just not an infinite resource, and yet they have all these asks. And and we know that at least, you know, a good chunk of these questions are very simple questions that is most 4 to 5 lines of SQL that your highly paid data science team should not be working on. And so this is where we think about sort of the democratization of this tech to be able to make it really simple, giving you this easy button to be able to do your own data exploration by giving you these smart defaults will lead to higher data engagement.
That's sort of the genesis of how we thought about, you know, this query composition techniques, these, query intelligence layers that we needed to build in based on the underlying data. And so by empowering these folks, by making them be able to self serve, we think they'll be able to ask smarter questions. Now we still want the data science team involved in doing these highly complex analytical workflows, these machine learning models that they need to be appropriately spending their time on. But let the 40, 50% of other easy questions be sort of self-service, and let that be something that the teams can work on with an app like Zinc.
[00:29:20] Unknown:
As you have gone from your initial conception of the problem that you were trying to solve and the ideas that you had about how you were actually going to approach it and what the solution would look like And working through those initial design steps and working with customers to where you are today, I'm curious what are some of the useful mistakes that you've made and some of the assumptions that you had that have been invalidated or challenged or that that you've had to change in the process?
[00:29:49] Unknown:
Yeah. Absolutely. I think 1 thing is thing I brought up earlier, which is, like, this difference between emerging economies, global usage of BI versus, you know, BI usage in in the States and Europe and how there's a difference there. So earlier on, our first few versions of of Zendesk data was completely app only. You have to download the app to be able to interact with any kind of data. And the more we engage with users, the more we realize ubiquity of access is actually important. And so the later addition has been the web access where you can actually run these queries with these smart defaults with the same point and click mechanisms, Not necessarily the tap and the swipe and the and the drags, the similar point and click mechanisms, but aimed at a web workforce.
So that was something that we, you know, didn't think we needed initially. Did the research, you know, based on user feedback, was able to get that out there pretty quickly. Now we have, you know, similar functionality across all 3 of that. Other things we thought about initially was how much time should we really spend on the data connector piece? And, you know, we thought it will be important to have data connectors for, you know, your MySQL Postgres and then leave it at that and then be able to to iterate on product features based on signal from users. The more we talk to folks, the more engagement we saw, we just realized there's this wide swath of of underlying database tech, and I think increasingly so in, you know, in your conversations, you've probably seen that as well over the last 5, 10 years where, you know, people are running the gamut, right, from your Google Sheets database to to MySQL, Postgres, to all your AWS equivalents of those, and then, you know, your data breaks and data lakes. So that was sort of a a later realization for us that we kind of had to spend some amount of time on a regular cadence to build out these connectors and to make sure that those support all enterprise use cases, which is things like SSH tunneling, you know, things like custom certificates. You know? Doing that was a sort of a later addition, but we were able to to centralize on that pretty quickly. I think a third thing that we saw was also kind of related to this database connector tech and how we did our dialect parsing. You know, there's they're all NCSQL 99 compliant is what they say, but there's still differences that we've come to realize.
And so our initial version of our sort of query intelligence layer in the back end had very distinct mechanisms for dialect parsing and then the query composition and to account for these different anti SQL variants. And then when we started to add more connectors, we realized how really that code started to become. And so there was a quick effort by our team to be able to centralize and and sort of coalesce that into sort of an extensible framework where we could add dialects, but only things that had changed. And so that was effort that we put time into, and we've certainly benefited from that. You know, the pros and cons for such an effort, which is now if you've got centralized code, the testing regimen needs to be up to par with every release, and so we had to invest time in that as well. So that was just a case of things that we recognize we have to do. But, you know, with any startup, sequencing is always a thing. So
[00:33:03] Unknown:
Yeah. The frustration over ANSI SQL not actually being a recurring theme that has come up many times over, like, a series of interviews. Yes. So Some part of me wonder if it's nefarious to a point where you kind of get this automatic lock into a database vendor because of user SQL, but I don't know. That's a broader theory there. That's part of the reason that the Postgres implementation has become so ubiquitous as kind of a self fulfilling prophecy of, oh, we'll just use the Postgres version because other people use it, and then we'll you know, engine that's just gonna use the Postgres dialect because that's what other people are using, and so it's become this snowball where everybody just says, okay. We'll just use the Postgres dialect.
[00:33:44] Unknown:
Yes. Exactly. Yep. So 1 thing that is on our to do list from our future standpoint is SQL Server Integration. It's strange that that's the 1st database I ever worked with in a professional setting, and that is something we'd be able to connect with. But there are strong differences there in in the in the, anti SQL compliances there. So Absolutely.
[00:34:03] Unknown:
And the additions, having written some fairly gnarly T SQL, Been a while. But
[00:34:09] Unknown:
Right. T SQL, PLSQL. Yes.
[00:34:12] Unknown:
And so for an individual or a team who is implementing Xyng data, I'm wondering if you can just talk through the typical workflow of going from, we've got it set up to, now I actually want to start building out a suite of reports and being able to enable some exploration or maybe some of the ways that a small data team can help provide some guidance and guardrails for other members of the business.
[00:34:39] Unknown:
We've initially and purposefully made tried to make that dead simple in terms of how you'd be able to connect to a data source. So the steps would be you go to Zendesk data, you sign up, and then there's a simple drop down to be able to add, you know, creds for any kind of data source you're connecting out to. So if it's something like a Google Sheets, there's a lot of info related to that, to your Databricks or BigQuery. There's JSON you can drop into to be able to connect out to that. And then then configuration settings for client certificates and SSH channels if those are the things you want to set up. After that, you now have access to this data source. You are also put in what we call the default organization.
This is our concept of of business unit or enterprise or anything that that you want to extend it to. You can then invite colleagues into it. And from that point on, you can also limit access to certain tables or data sources for different invitees. So there's an intuitive RBAC that's built in as well for data that's almost implicitly required. And so from that point on, anybody can start asking questions of their data based on the access level they have. There's this tap to query mechanism where you can kind of go into a table, tap on columns, set the right filters, and you get access to a chart representation if the query allows it or a data table representation.
And from that point on, you can share that result with colleagues. And then they now start to get pulled into that query result, and you can have a collaborative thread based on those results. Another key thing that's different for Xing Data is how we've thought about the home screen interface. When you first have Xyng on your phone or you're on the web app and you go straight to that home screen, you kind of see a knowledge graph of questions that have been asked by your colleagues. And they're ranked almost similar to, like, a news feed or a TikTok feed where you can tell, you know, these are popular questions that people in my network or my company graph have asked and commented on. And so there's sort of a company knowledge base that's being built as a result of these data questions. And so that's sort of the engagement mechanism we really think about, which is you're on your phone, you're able to see what people are asking.
You're able to start getting results right away and not having to construct a query from scratch. And we think this is the best way to really get people more broader access to data.
[00:37:03] Unknown:
Another aspect of what you're building is being mobile first. There is a certain set of patterns and questions that people are going to want to use for interacting with it. I'm curious if you see it as being the primary or sole business intelligence system that users implement or if it is also often used as an augmentation to other business intelligence or data exploration suites that might be available to an organization?
[00:37:31] Unknown:
I think for where we are now, you know, again, just kind of relating back to the spectrum of users that we're seeing from this deskless workforce type users that have never interacted with the data team to people in divisions inside of a really big SaaS company are using us because they can't get time on the data team. So just given the spectrum of users, I think, you know, we are a really good solution for these folks to be able to get access to the data. So in in those scenarios, for the product manager who's at a big SaaS company and has waited a month to get a turnaround on on the report they've asked their analyst team, ZingData would be sort of an augmentation to their existing tooling.
For the person who has never had any kind of data engineering team or any kind of visualization software, we would be their primary mechanism. I think long term, the way we think about it is, you know, we really focus on the use case. We really focus on how people are interacting with data now and into the future. I think if we do this right, we will become the primary mechanism. Because if you've made this really simple, more people are going to want this and more people will want to engage with the data, which is the ultimate dream of, I suppose, any data data team.
[00:38:44] Unknown:
In your experience of building Xing Data, both the technology and the business and working with your customers, what are some of the most interesting or innovative or unexpected ways that you've seen the product used?
[00:38:55] Unknown:
We continue to be surprised by this. I think the example that I put earlier around the concert operator using us to look at seating chart info was something that we hadn't thought to ever manifest as a BI use case. We've actually seen, you know, sign ups from a national pizza chains as well, where they wanna track levels of tomato sauce in the different franchisees. And so that's another use case that is, you know, atypical for what I think is a typical PR use case, you know, inventory management. I think we will continue to be surprised by the use cases here. You know, when we make something as simple as we have done with Xyng, I think it starts people's thoughts around how to best use this. If I can use it for this, why can't I use it for that as well? So why limit myself to only just sales sales data? Let me look at product usage data. Why just limit myself to that? Let me look at, you know, developer commit data. Like, there's quite interesting use cases that we've seen. But I think these are 2 good examples that we've seen that we've never thought would be reality.
[00:39:53] Unknown:
And what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:39:58] Unknown:
For us as a startup and working with some really interesting enterprises, it's it's been interesting for us to see and evolve sort of our road map based on what we think is going to be important and what enterprises want now. I think for us as Zendesk data, we're kind of bridging the gap between where we see a future to where we are now, which is enterprises primarily use, you know, visualization software on their laptop. So I think for us, learning has been how to bridge that gap, how to sequence the feature sets correctly. So we are kind of solving for both use cases. We can help the farmer who is out in the field, wanting to look at his machine IoT data versus, you know, the person who's at the big SaaS company wanting to look at product usage data. So, you know, web interface, for example, was 1 good example that we kind of initially didn't think we needed to have, but then later came to the realization that this was a big part of it. Those have been interesting, I think. I've mentioned the thing around data connectors, how there's been a broad usage around different types of databases.
And we'll continue to evolve. I think we the future is pretty interesting for us to see what else we can bring in.
[00:41:14] Unknown:
And so for people who are interested in being able to access their data away from their desk or away from their laptop, what are the cases where Xyng is the wrong choice?
[00:41:24] Unknown:
Yeah. Absolutely. I think Xyng data really shines for the 40 to 50% of us on data teams right now, which is, you know, give me product data, usage data, user data, and so on. This 4 to 5 lines of SQL that people can easily query off of. I think we really shine in those use cases. You can certainly construct more complex SQL. We actually have a full SQL editor as well that you can kind of save as a question and be able to share out. Where we think Xyng is probably not the best choice at present, and I think this is probably a gap that we'll cover later on as we evolve, is probably where you need to do really deep analytical models where you need, you know, a Jupyter Notebook to be able to crunch through the different iterations and the different possibilities.
And I think these are where you have your data science teams to really help with. I think in those cases, at present, using data would not be something we'd use for that. We still think that that's a gap that we can really close in the next, few months to a year, and I think that's something we're really looking forward to. But at present, this is where we are.
[00:42:27] Unknown:
As you continue to build out the product, what are some of the things you have planned for the near to medium term or any particular projects or problem areas that you're excited to dig into?
[00:42:36] Unknown:
Yeah. Absolutely. We are just deep heads down into product and feature development. 1 thing that we continue to spend and budget time for is, increasing the database connector list that we have. So SQL Server, for example, will be something that we'd like to be able to add to our lists, Azure Synapse, and the other technologies there will be would be things that we'd love to plug into. So those are the things on our product to do list. And then sort of expanding on our enterprise feature list, you know, audit log availability, surfacing those up in in the console to be able to really get access to that. I think those are, you know, just a few key features that we know we need to build on and our budgeting time towards. But I think there's also more interaction and more mechanisms we're we're keen on developing as well. So the ability to do table joins via a gesture map, so you can kind of long press and drag into tables, and it will automatically construct the CTV for you. Those are pretty cool things that we're really excited about, and we have a few early concepts that we're kind of iterating on at this point.
Are there any other aspects of the work that you're doing on Xyng data or the overall space of mobile first access to business intelligence and data assets that we didn't discuss yet that you'd like to cover before we close out the show? I think 1 thing that I've seen come up, has been kind of interesting has been, you know, sort of a natural language interface to ask some questions of data. And, you know, we thought long and hard about this as we were setting up the company as to what a solution for something like that would look like. And in every sort of prototype that we'd set out, we almost found that tapping and swiping into your data was a lot faster than being able to chat or natural language into something. We think there's quite a gap in being able to ask cleanly, you know, a natural language question of your data and getting the appropriate results. And so, you know, that's something that always comes up as we do demos. It seems like people are really intrigued by the concept of what we do as in data, and and natural extension to that is to have, like, a voice interface to it. We continue to think about it. I think there's there's still some ways to go, and we will possibly have an answer to this. But it's pretty interesting to see how in a phone and in an app that we have with Zoom, you can get results really quickly by just tapping and swiping. And so that's 1 intriguing thing that I think, keeps being brought up.
[00:44:59] Unknown:
For anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technologies that's available for data management today.
[00:45:14] Unknown:
Yeah. Absolutely. So my info should be on there. Feel free to reach out to me, Twitter or LinkedIn. I'd love to connect with folks. I think you'd be great kind of a great audience. I am really enamored by the number of tool sets that there are out there for problems that we've had for quite a while. You know, I mentioned my first job out of college is writing reports in T SQL for SQL Server. And to be able to have a data cataloging solution, to be able to have things that would do data lineage orchestration, you know, those will be dream tools that we just didn't have back in those days and so glad to have now. I think there's an interesting feature here where it still seems like there are point solutions for a lot of these things. And to be able to really get, you know, a pipeline and infrastructure up from the ground up, It requires you having to work with 25 to 30 different tools. And so and there's also differences a bit based on your underlying database tech. So I think it's it's an interesting future. I think there's some consolidation that could really benefit a data practitioner.
And then also, you know, the target market which we serve, which is the data consumer. And so we think long and hard about that, but I think the future is bright, and it's quite promising to see, the number of folks working on this.
[00:46:23] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing at Xing Data and helping people be able to gain better and more democratized access to their data assets. Definitely appreciate the work that you and your team are doing for that. So thank you again for taking the time, but I hope you enjoy the rest of your day. Absolutely. Thanks so much, Tobias. It's a great show. Love listening, and thank you for the opportunity.
Introduction to Sabine Thomas and Xyng Data
The Genesis of Xyng Data
Mobile Business Intelligence: Challenges and Solutions
Real-World Use Cases and User Interaction
Engineering Mobile Native Capabilities
Supporting Developing Economies and Unique User Needs
Empowering Users Without Data Teams
Lessons Learned and Product Evolution
Implementing Xyng Data: Workflow and Use Cases
Unexpected Use Cases and Future Plans
Natural Language Interface and Future Directions