Summary
The majority of analytics platforms are focused on use internal to an organization by business stakeholders. As the availability of data increases and overall literacy in how to interpret it and take action improves there is a growing need to bring business intelligence use cases to a broader audience. GoodData is a platform focused on simplifying the work of bringing data to employees and end users. In this episode Sheila Jung and Philip Farr discuss how the GoodData platform is being used, how it is architected to provide scalable and performant analytics, and how it integrates into customer’s data platforms. This was an interesting conversation about a different approach to business intelligence and the importance of expanded access to data.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise.
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- GoodData is revolutionizing the way in which companies provide analytics to their customers and partners. Start now with GoodData Free that makes our self-service analytics platform available to you at no cost. Register today at dataengineeringpodcast.com/gooddata
- Your host is Tobias Macey and today I’m interviewing Sheila Jung and Philip Farr about how GoodData is building a platform that lets you share your analytics outside the boundaries of your organization
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by describing what you are building at GoodData and some of its origin story?
- The business intelligence market has been around for decades now and there are dozens of options with different areas of focus. What are the factors that might motivate me to choose GoodData over the other contenders in the space?
- What are the use cases and industries that you focus on supporting with GoodData?
- How has the market of business intelligence tools evolved in recent years?
- What are the contributing trends in technology and business use cases that are driving that change?
- What are some of the ways that your customers are embedding analytics into their own products?
- What are the differences in processing and serving capabilities between an internally used business intelligence tool, and one that is used for embedding into externally used systems?
- What unique challenges are posed by the embedded analytics use case?
- How do you approach topics such as security, access control, and latency in a multitenant analytics platform?
- What guidelines have you found to be most useful when addressing the concerns of accuracy and interpretability of the data being presented?
- How is the GoodData platform architected?
- What are the complexities that you have had to design around in order to provide performant access to your customers’ data sources in an interactive use case?
- What are the off-the-shelf components that you have been able to integrate into the platform, and what are the driving factors for solutions that have been built specifically for the GoodData use case?
- What is the process for your users to integrate GoodData into their existing data platform?
- What is the workflow for someone building a data product in GoodData?
- How does GoodData manage the lifecycle of the data that your customers are presenting to their end users?
- How does GoodData integrate into the customer development lifecycle?
- What are some of the most interesting, unexpected, or challenging lessons that you have learned while working on and with GoodData?
- Can you give an overview of the MAQL (Multi-Dimension Analytical Query Language) dialect that you use in GoodData and contrast it with SQL?
- What are the benefits and additional functionality that MAQL provides?
- When is GoodData the wrong choice?
- What is on the roadmap for the future of GoodData?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
Links
- GoodData
- Teradata
- ReactJS
- SnowflakeDB
- Redshift
- BigQuery
- SOC2
- HIPAA
- GDPR == General Data Protection Regulation
- IoT == Internet of Things
- SAML
- Ruby
- Multi-Dimension Analytical Query Language
- Kubernetes
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management.
[00:00:17] Unknown:
What are the pieces of advice that you wish you had received early in your career of data engineering? If you were to hand a book to a new data engineer, what wisdom would you add to it? I'm working with O'Reilly Media on a project to collect the 97 things that every data engineer should know, and I need your help. Go to data engineering podcast.com/97 things to add your voice and share your hard earned expertise. When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With their managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar and Packaderm.
With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform. Go to data engineering podcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Good data is revolutionizing the way in which companies provide analytics to their customers and partners. Start now with good data free that makes our self-service analytics platform available to you at no cost. Register today at dataengineeringpodcast.com/good
[00:01:33] Unknown:
data. Your host is Tobias Macy. And today, I'm interviewing Sheila Jung and Philip Farr about how Good Data is building a platform that lets you share your analytics outside outside the boundaries of your organization.
[00:01:43] Unknown:
So, Sheila, can you start by introducing yourself? Hi, everyone. My name is Sheila Jung, and I started working for GoodData almost 5 years ago as a solutions architect in professional services. I am currently a senior manager in product enablement, leading a team of customer engineers and developer advocates. And our team's main focus is to empower the GD developer community and enable our internal sales team. Thanks for having us. And, Philip, how about yourself?
[00:02:09] Unknown:
Hello, everyone. My name is Philip Farr. Having been with GoodData for a handful of years at this point, I've held several different positions, which have touched on various aspects of the data engineering world. My current role is as a senior manager of technical program management and customer success, where I help oversee a team which partners with our customers to, provide technical expertise and data product development, and ensure a positive mutually beneficial experience throughout our entire customer life cycle journey. Thanks for having us today, Tobias.
[00:02:46] Unknown:
And going back to you, Sheila, do you remember how you first got involved in the area of data management?
[00:02:51] Unknown:
Yeah. Definitely. I got involved in data management right after college when I joined a boutique consulting startup, as a BI and ETL consultant. This is where I started getting familiar with various data management, warehousing, and visualization tools. That startup was eventually acquired by Teradata, and I've been in the data space ever since. And, Philip, how about you?
[00:03:11] Unknown:
So from a a a professional first experience, was was probably when I joined a consulting firm, which specialized in cybersecurity and privacy strategy right out of college. During that time, GDPR was becoming an increasingly hot topic in the security and privacy space. Many companies were searching for, difficult to answer questions, like, do we already have or can we create some type of data flow diagram to help us identify the right data relating to an individual, or how do we ensure proper deletion once, an individual requested that, once you've identified all of this information, and where it was located?
So my my firm was really brought in to help, solve these complex problems, and and that's pretty much where I was, exposed to, data management.
[00:04:08] Unknown:
And so in terms of the work that you're doing at GoodData, can you give a bit of a description about what it is that you're building there and as much of the origin story as you're each familiar with?
[00:04:18] Unknown:
Sure. So Good Data's origin story starts with our CEO and founder, Ramon Sonek. Ramon was previously the founder for, multiple companies like NetBeans and Sysranet. And after his last company got acquired, he started GoodData in 2007, and his mission was to disrupt the BI space and monetize big data. At GoodData, what we're doing right now is we're building a way companies can provide analytics directly to their customers and partners. Good data is seamlessly integrated into workflows and provides, authenticated access to business reporting, dashboards, and ad hoc analysis, which is delivered at a timely, relevant, and customizable way for the independent users.
[00:05:02] Unknown:
And the business intelligence space in general has been around for decades at this point with a number of different iterations of it and different generations of how the tools are used and the types of data that they're dealing with and questions that are being asked and answered. And I'm wondering what the factors are that might lead somebody to choose good data over any of the other contenders in the space and some of the use cases that it's uniquely well suited
[00:05:28] Unknown:
to work for? Yeah. That that's a great question. And I think, you know, we are from good data. So, obviously, we're we're we're slightly biased in this regard. But we really provide you with a platform to help boost the adoption of your application by embedding modern analytics. And the goal is to empower all of your end users via ad hoc exploration or templatized analytics and really remove what we would consider to be the costly customization of 1 off reports. Right? Say, an internal organization spinning out 1 time reporting, to support whatever questions or answer whatever questions they're seeking. And so we've developed this cloud hosted solution where changes roll out very quickly, and we we really enable those end users to continually benefit from our platform improvements. Right? And this can be done through, you know, powerful analytics for any persona, self-service dashboards, interactive visualizations, what what you could expect from a analytics tool.
On an on another hand, we're really looking at flexible pricing options, which help enable, companies to use good data, which are vary in size. Right? So we recently launched free and growth pricing tiers for easy entry and quick scaling into different departments or across various solutions for customers. And then, you know, rounding out those offerings, we do have a an enterprise offering or a long standing offering, which supports thousands of users and terabytes of data. So we're we're really pushing the boundaries of scale there. From the flexible platform perspective, right, like, this is another 1 of the the the key, differentiators, is the the platform is built, for flexibility for developers, and we provide kind of infinite options here.
We have a React based JavaScript library for creating analytical interfaces, which users and their creators absolutely love. In addition to this, there's very well documented APIs, which, allow, you know, creators to, interact with all platform capabilities via SDKs that we have developed. Furthermore, I I think another area is really the robust and flexible data integration where we we have the ability to support any, you know, data source or technology stack. We integrate with things like Snowflake, Redshift, BigQuery, so all of the big cloud data warehousing names, as well as the, ability to ingest data through hundreds of different connectors.
And this this is not limited to, say, small data volumes. It's it's any data volume, any type of connector. We can do custom connectors as well. And then, you know, rounding it out is the enterprise level security, and governance structure that we have put in place. We support things like agile change management, real time user provisioning, solution monitoring, and we have a variety of, compliance, and certifications in place, SOC 2, HIPAA, GDPR. These are all things that we're able to, help you solve as a customer.
[00:08:45] Unknown:
Regarding the use cases that, you're mentioning, Tobias, I would say that there are a wide range of use cases in industries, anything ranging from financial services to retail management. We don't actually have a single industry that we focus on supporting.
[00:08:59] Unknown:
In going off of what Sheila said, I think our primary focus has really been to develop an analytics solution which we call powered by good data, and that's kind of our term for it. And you the best way to explain powered by good data is it really is, an industry agnostic use case or a business model, if you will, where we partner with a customer, and then that customer is looking to distribute analytics to all of their customers and provide it to, you know, tons of end users. And so we a typical good data use case would be to distribute analytics, that same version of analytics at scale, in the context of the end users' data for as few as, say, 5 customers, but as many as 25, 000.
[00:09:49] Unknown:
And as Sheila mentioned, right, this could be done pretty much as you can imagine for any industry. And that's 1 of the interesting pieces of the product that you're building where a large component of the business intelligence market is focused on these internal analytics use cases where you connect it up to your different data sources, usually some sort of data warehouse, and you have scheduled reports and dashboards that internal business users can look at to get a temperature check and get some sort of sense of how things are going in their business, maybe things like inventory or sales figures.
Whereas with good data, what you're saying is that it's primarily focused on external analytics where maybe a SaaS platform is providing some view to their customers in terms of the usage that they're getting out of the platform or maybe sales figures that they're tracking in something like HubSpot. And so I think that that is 1 of the unique things about good data where it is much more external facing. And by virtue of that means that you have to take much more of a platform approach versus a point solution that you might get with something like Pentaho or Redash or a, Superset or something like that. And I'm wondering what your thoughts are on just the overall market of business intelligence tools and how that's evolved in recent years and some of the contributing trends in the technology and business use cases that have brought us to where we are today and what where you're going with good data. Yeah, Tobias. That is a really great 1. So I would say that business intelligence tools
[00:11:22] Unknown:
evolved in recent years, to help businesses make informed decisions. So the change is that analytics has been used for decades to help businesses make informed decisions both strategically and operationally by deriving insights from the data that they've collected. In more recent years, that has now shifted towards analytics everywhere. So rather than confining data analytics to a single use case or location like you were mentioning, We are seeing an increased demand and value add for distributed analytics for your business partners, employees, customers, pretty much everybody. And the goal is to enable all the people at various levels and businesses to make these data different, decisions with confidence.
[00:12:03] Unknown:
And then from the the trend perspective, Tobias, I think there are a couple worth noting, that we do see customers actively pursuing and and and building upon. So the first 1 that I I'd like to call is really the democratization of technology. Right? Right? And and this is really where customers or users are having increased accessibility to technology, and this is pretty much pervasive throughout every industry. Right? And for us, it's really the change towards analytics everywhere, and it's attributed to the increased accessibility of technology and modern days reliance on this type of accessibility.
So this spans all aspect of a business from, say, the employees to clients or customers to stakeholders or executives, right, who all need to be able to access and make decisions in real time, right, make those data driven decisions that are important. And so, you you know, we're we're seeing this in a variety of industries. 1 example would be, say, the sharing economy. Right? Now as opposed to maybe a historical approach where decisions are made just based on, a couple or handful of internal stakeholders. Right? Now you're needing to put data in the hands of many, many users. Right? The people who are actually sharing or participating in the sharing economy, the, you know, the providers or the renters, and and how do they get the access to the data that they need to make these decisions in real time? And that has a, a really high influence over your business.
And, also, like, when you provide that type of analytics to to, those people or those individuals, it helps enable companies to overcome their competitors. Right? You're giving power back to that user In a similar vein but a different, trend. Right? IoT, the Internet of Things, is an area where we see many customers playing. Businesses where they may not have realized historically what data they had access to or how powerful that data is to provide it back to their end users or how to effectively share that with people across different levels of data literacy. And so we're we're talking about more archaic industries like auto parts or transportation, which can be slower moving and more publicly, you know, or governmentally regulated.
And so it's it's really giving these large industries access to analytics, which they probably never had thought about. They never maybe requested it, or they may not have ever thought that this would ever come to their industry. But we're helping enable those types of companies to modernize and provide insights to to their customers.
[00:15:09] Unknown:
And the other piece that's interesting is the fact that, as you mentioned, you're developer oriented where you're focusing on exposing a set of rich APIs for being able to build analyses and visualizations on the underlying data where a lot of the existing suite of business intelligence tools are built as a vertically integrated solution where the dashboarding is just a native capability, but also likely somewhat constraining interesting and unique interesting and unique ways that that capability is being leveraged by your customers.
[00:15:47] Unknown:
So thanks, Tobias, for mentioning the APIs. That's definitely something that our end users are leveraging, especially from the developer aspect of people that are, integrating the, good data platform into their own analytics. And that's something that we, leverage through embedded analytics. And the ways we're able to embed the GoodData platform into the client's products or their own application is in 3 different ways. The very first way is just through, like, direct embedding via Iframe, where you're getting the good data reports directly into the client's app or platform, and that isn't utilizing, the good data you APIs.
Another way is just to embed the link that is directly linked to the white label good data portal. And then the third way is the good data, UI or gd.ui, which is the react based development library, allowing developers to seamlessly integrate into good data with their product. So combined with something that we developed called the accelerator toolkit, this pretty much streamlines the front end development efforts so that, there's a lot of custom visualizations and integration
[00:16:58] Unknown:
into the customer's app. And going off of that, I think where this really plays a role in is, the, say, the customer development life cycle. Right? The this is where we see the most heavily or heavy reliance on APIs or open APIs and SDKs, which really allow for that seamless integration and, say, a CICD type of system. You you know, the platform really provides, say, advanced support for release and rollout procedures, which a customer can leverage to cascade across different environments and manage all of those life cycles independently. So we're talking about, say, a development life a development environment or a QA environment or a production or a series of different production environments based on, segmentation.
The platform also provides support for, say, on demand provisioning, right, that can be driven and integrated with SSO. For example, say, on demand provisioning via SAML 2 assertion. And so, you know, 1 of the the good use cases, I think, Tobias, you had mentioned a use case is we do have a customer who has is using good data to provide analytics on project management software. There is a lot of technical complexity to the good data platform. They control all aspects of their application in which they bed embed good data at different levels of granularity and expose the analytics to their end users.
They had very detailed customer customization that supports, you know, the their preferred user experience through, say, our APIs and SDKs. And they manage all aspects of good data, including things like data loading, user management, including role assignment, provisioning, deprovisioning, front end development for 2, 000 plus of their own customers.
[00:19:07] Unknown:
And because of the fact that you are serving up these analytics capabilities to developers and end users who are integrating it into their own product, whereas a lot of the business intelligence market was oriented towards the business analysts and data scientists as their end users. I'm wondering what you have found to be some of the useful guidelines or guardrails for helping your
[00:19:37] Unknown:
regards to, the semantic layer here, and the semantic layer here is really the logical data model that we can speak about. This is a layer that ensures that everyone understands the data in the same way, including self-service users. So this, semantic model can be leveraged for guided analytics and provides a shared understanding for those analyzed entities and their relationships. This means that objects that were created by analysts once can be used by other common users and helps them to interpret the data and perform ad hoc data discovery, and this is possible through our analytical designer. So as you were mentioning, when we're migrating from the concept of, individual contributors or, like, individual reporters looking at data analysis,
[00:20:27] Unknown:
this semantic layer allows multiple people to take a look at the same data and understand it in the same way. And I think thematically, Tobias, like, 1 of the things that you're referring to and kind of it it's 1 of the things that we find to be most valuable about good data is this this concept of migrating away from just these internally used business intelligence tools to these externally used, say, embedded analytical products. Right? And and when we look at these as a whole, both have similar types of requirements for things like data security and compliance and the great user experience for productivity, say, the ease of development.
Maybe it's the the different ways to integrate the data or the ability to build semantics around that data. Right? But when we consider the realm of embedded analytics, right, or it's something that we call analytics everywhere at GoodData, we're we're really looking to embed directly into that software or that application and and specifically in the context of that use case for that end user. Right? Unlike the internal analytics space, the embedded analytics really requires very strong life cycle management capabilities. And and we're referring to, you know, provisioning, versioning, how do you perform releases, how do you roll that out to many of your customers or your end users.
Right? If you have, say, 3 customers, you may be able to build a solution from scratch every single time and maintain those changes and silos. But if you start considering, say, 100 or 1, 000 or maybe even tens of thousands of customers and up to 1, 000, 000 users, right, How do you manage that? You really need a a completely different architecture and power to operate or handle that change management and develop a solution for, may say, different customer segments or different ways that you're choosing to monetize that data product or different access to different datasets. Right?
And so the the difference is really that the internal analytics is centered around, you know, personal aid and productivity, where these, you know, SaaS embedded analytics provide an aid in collective productivity across users and across multiple organizations. The the 1 other piece I'd like to add for for that is there's an additional added complexity when you're considering many, many users. Right? And and that's really you know, how do you provision and deprovision all of these users, but also retain, say, the complete control over different levels of access?
And that could be access at the the the as granular as, say, the data row level. Right? Or it could be access to, you know, which dashboards or which reports that they're getting or the ability to create their own dashboard. So these are all complexities, I think, that we are solving for in this world of embedded analytics.
[00:23:36] Unknown:
And digging deeper into the good data platform itself, can you talk about how it's architected and some of the evolution that it's gone through as you have continued to build out new capabilities and stay up to date with the changing landscape of
[00:23:57] Unknown:
architecture piece. Right? We we've developed what we consider to be a very modular set of components, which our customers can kind of slice and dice and and append to 1 another to create this end to end distributed analytical solution. And so I'll kind of walk you through maybe data source level to, say, dashboard level. At the lowest level, you know, when we're talking about this end to end pipeline, we're really speaking to data ingestion. Right? We have a 150 plus connectors that are available for regular download of data from all sorts of source systems. We we also do direct connections to those cloud data warehouses like Snowflake and Redshift and BigQuery.
We could build out custom connectors in certain instances that sit on top of our customers' open APIs as well. So we we've we've we've done all of this in the past. From there, we run through our ETL processes and load data into what we call, ADS as the acronym, which stands for agile data warehousing service. And this is our internal data warehouse for staging and transforming the data. It has, you know, all sorts of, different, you know, tables and and and views to support the transformations. And then ultimately, you know, once we're ready to load this to, different, say, tenants, right, or workspaces and good data terminology, we use a mechanism called automatic data distribution or ADD.
That distributes data to workspaces themselves. And what happens is, you know, we can do this from ADS. We can also do this directly from cloud data warehouses if if the schemas match, whatever is in the logical data model, in the workspace. And so we do have a lot of flexibility there to, you know, load data into workspaces And to define, workspaces for for the audience, a workspace really is this end storage, which is a datamart that is loaded with a specific subset of a customer's data so that, you know, when their customer wants to view analytics in the context of their data, it's just that particular subset of data that's been loaded. The workspace you know, within that workspace, we have the logical data model or the semantic model, which Sheila had referred to earlier.
And and on top of that semantic model, we're able to easily build out dashboards, reports, metrics, all in the the the context of the business, as well as enable a key functionality of the platform, which is analytical designer, which we use as an ad hoc data discovery tool where you can easily drag and drop, slice and dice your data so that you can come to your conclusions and insights more quickly. We spoke about SDKs. Right? We have SDKs that interface with our open APIs, and then we also provide the tools for embedding, say, via Iframe, more granular embedding via our good data dot UI, which leverages the React and Angular framework. And and and throughout all of this, right, we have the life cycle management tooling to control provisioning, releases, rollouts, user provisioning as well. And I I think the guiding principle for all of this architecture really is governance and security. Right?
It it provides, you know, a a driving force behind the the reason why our platform is architected the way it is. We support, you know, complete end to end SLAs. We have, you know, top security for, you know, those certifications like GDPR or HIPAA compliance. And we really have good data as these platform components that we can, you know, fully manage our own infrastructure and provide that design, which enables, you know, high performance, largely scalable, and, ultimately distribution of these analytical workspaces.
[00:28:16] Unknown:
And with the modularity of your architecture, I imagine that also simplifies the use case of letting your customers have different integration points into your platform for determining where in the life cycle of their data they want you to take over because there might be some custom ETL logic that they wanna do on their systems before they load it into their workspaces, or they might just have a data repository of a data lake or a data warehouse somewhere, and they just want you to do everything end to end. And I'm wondering what the options are for people who already have an existing data infrastructure and processing capabilities to lean on good data for just the pieces that they care about and some of the examples of customers who are hooking in at those different stages of their life cycle?
[00:29:03] Unknown:
Absolutely. So what you're referring to is, what pieces of the good data architecture would the client want to leverage? So we're talking about, the data warehousing piece that Phil was talking about with ADS where we could potentially get the aggregation of lots of different data sources on 1 place, whether or not that's something that the client wants to, leverage from the good data side or own on their side. There's also the loading mechanism, the ADT piece, where we're talking about how the client, would be able to load that data, whether or not they want to keep it on their side or, actually keep it on the good data side. So the ways we're able to manage that is really the flexibility of the types of sources we're able to download from whether or not we are doing the transformations or just loading directly into the platform.
So with all the connectors that we have with these prepackaged Ruby bricks that are leveraging the good data APIs as well as the source APIs, we're able to integrate their data and load into ADS through those connectors. Or if the client wants to own a lot of the transformations themselves, match the exact, metadata output for the, the semantic layer or the models that are on the workspaces, they're able to load that directly in with their, data warehousing source through our automated data distribution or ADD, especially if they're using things like Snowflake, Redshift, or BigQuery.
[00:30:36] Unknown:
And in the overall system architecture of what you've built at GoodData, how much of it have you been able to leverage off the shelf components for, whether it's things like Kafka or pre built data warehouse systems? And how much of it has had to be custom engineered because of the complexities that you're working around and having to design around in order to ensure that the entire system remains performant for a multitude of customers in a multi tenant situation?
[00:31:02] Unknown:
Yep. So from from an off the shelf component, perspective, right, we're really talking about 2 2 primary areas, the first being front end and the the second being the back end. We are leveraging a pluggable UI framework on the front end. So this gives us the ability, to use many publicly available UI components like React and Angular for the data presentation and visualizations. And what really is enabled here is that seamless integration. Right? You can basically build the custom client application and give it that slick slick look and feel to match your client's vision or the, you know, their customers need or they maybe they have a specific style guide that they have to follow for all of their internal applications that they have built. So it really has infinite possibilities relying on this particular framework.
Similarly, our back end, it has a pluggable container based architecture as well. And so that allows us to deploy these custom code, you know, custom code bits like these modular bricks that we had referred to previously, which are essentially productized Ruby scripts that interact directly with our open APIs. And and and the idea is that we can deploy custom code that's written to our transformation processes, and give kind of this more flexible architecture for ETL management and data pipeline management. And and these bricks can be orchestrated into data transformation workflows for things like, data ingestion context of our internal data data warehousing.
[00:32:53] Unknown:
And then for somebody who's building their data product on top of good data, what is the overall workflow for being able to go from concept to completion?
[00:33:03] Unknown:
Yeah. That's a that's a great question. Right? So, you know, how how do you build a data product? And the way that we approach it at GoodData is we really strive to build the data products, which focus on a specific end user persona. Right? Because we we believe that this is what really highlights the value of having analytics, or embedded analytics, in the context of of an application or or a site or whatever that may be. Once we understand the mindset of those end users, we are then able to build the dashboards, reports, metrics, KPIs, right, all of these analytics, which enable those individual visuals to make those data driven, decisions in the context of their roles. You know, it's about understanding the problem and then getting to the answer.
And that's definitely achievable through the good data platform. The areas that this can be broken down into when we talk about actually implementing an embedded analytical solution really is in in 5 main areas. And the first is getting that data. We we need to extract and consolidate from, say, your application or your data warehouse or flat files that you're pulling from disparate areas of your business. Maybe it's a connection to a third party that we need to augment, you know, your internal data warehouse with, some other tooling that you're using outside of your infrastructure. And then we use this to create the the data model. On top of that, the next thing that we'll do is build out the analytics.
This is where we create those dashboards, reports, the metrics, right, that are relevant to the questions that are looking to be answered. And this is something that we think every single 1 of the customers should be looking to answer. Right? Like, this is the standard. Once we have that standard, we go through our release and rollout processes. And this is where we often refer to something as a a template or a master. Right? This is the the the standard reporting that everyone will get out of the box. And we take these life cycle management tools, and we help, you know, perform this rollout on either a stand alone or maybe a set of embedded reports or dashboards, and we we we disseminate that to the entire customer base. And and once the customers have access to it, right, and they have access to their data, They can go in and finally customize and build their own set of custom reporting.
Maybe they are looking at a different aspect of the business, or maybe they're looking at a reorganization that they need to solve for. And so we're really allowing the flexibility on the end user to to formulate the insights that they need. I think the final piece of an implementation is how do we operationalize everything, right, once it's in place. We need to manage and drive the entire end to end life cycle for all of those customers. Right? And we do this through, you know, things like provisioning and deprovisioning new customers and and new users.
Or maybe it's in terms of managing growth and scalability or monitoring actual usage on the platform. So I I think these are all of the the key steps that we walk through in building, say, a brand new data product.
[00:36:29] Unknown:
And then the other component of building the data product and the perspective of the customers in terms of working with their data is I'm curious how the overall life cycle of the data flows through the good data product from when the customer first collects the data through to delivering it to their end users and ensuring that the overall experience is as performant and robust as possible?
[00:36:57] Unknown:
So the design that we have to work around, to ensure performant access to the customer's data sources in the case of creating a data product, I think, comes from 2 sides. First is from the customer's data side, understanding, the customer's data instance in the case that we're actually pulling things from the Jira warehouse or they're sending us files to be ingested into ADS. Those are the kinds of things that we want to take a look at. What is the actual health of their production instance? Can we have access to separate schemas and views to make sure that we aren't negatively impacting their production environment? What is the size of the data that we're putting in, that we need to architect around? Do we need to, provide some sort of incremental ingestion logic with deletions and all that kinda of stuff so that we can handle large volumes of data in an efficient manner.
Another thing to consider is when we're looking at a client's data from ingestion to something that will match, our logical data model. They're gonna be very different. So from an analytical toolset, you don't necessarily need things at the most granular or transaction level. So when we're looking at things like that, how are we going to maintain some sort of data retention policy that matches the client's data, what they're giving us, versus what's actually going to be on the platform. So those are the things that we might need to think through and architect around. Another thing is, as I was mentioning before about, an aggregation of different data sources within our ADS layer. This is the kind of stuff, that we like to showcase where we centralize our data pipeline across multiple data sources. So this is something that the customer can leverage as well if they want to look at a various amounts of data sources. So let's look at, like, a sales example. They might have their own transactions, separately within their MySQL database, but then they also wanna pull in their Salesforce data. We can aggregate all that information if it's not available for them, in their own data warehouse. And the last piece about just the data side is really looking at that custom connector piece. How are we actually getting that data over from, their side over to us? And historically, we've had, experiences in the past where we were able to build up these custom bricks or connectors, so that our, clients would be able to have their data migrated from their side over to good data. On the other side of this, in terms of the data migration, I would also say there is a performant level that we like to acknowledge from the platform perspective.
So when we're looking at large amounts of data or if we're looking at, near real time analytics. There are things on the platform that we want to consider as well, like precaching. Maybe there is a very important meeting that a lot of people like the good data analytics for. We want to ensure that everything is cached before these kinds of important meetings or stuff like that. So we have scripts that, make that possible. Another, ability that we're able to, give to our clients is the option for different hardware, to handle high concurrency or high data volumes.
And there are also, a lot of different ways that we've enhanced the logical data model to make sure that it is, very reasonable to have performant access for for our clients. And 1 way that we are able to do that is through many to many relationships. Rather than duplicating data and increasing data volumes on that Datamart on the client workspace, we can leverage the many to many functionality on our data models to help clients with that kind of access.
[00:40:44] Unknown:
And in your experience of building the platform and working on it yourselves and working with your customers to ensure that they're having successful outcomes, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:41:00] Unknown:
I would say the most interesting part of working on client implementations, of good data is the breadth of business use cases we are developing for our clients. So no 1 use case, is is the same. They're all very different because all these different businesses have different business models and different data models. So I would say this was especially rewarding because of the constant innovative data challenges that we had to overcome with the good data platform.
[00:41:28] Unknown:
My my take on this 1 is based on all the implementations I I've seen, it never ceases to amaze me, but there's always always a customer that's trying to push the boundaries of platform functionality. Right? And and this is always a a tricky 1, especially in in a customer relationship standpoint. You want to help them meet their needs, right, within all of the platform limitations. And, of course, they want to do something new and creative, and you want to enable them to do that. But we need to find a healthy medium without compromising the end result or eventually setting that client or customer up for eventual failure due to a lack of sustainability.
And so I I it's it's always very, you know, interesting and proposes a unique challenge every single time it comes up because you have to go back to the drawing board and be like, these are all of our confines. How do we build something, new? Right? Or maybe it is. We we reach out to product to help us extend some of the existing functionality. But but oftentimes, the solution needs to be more immediate.
[00:42:37] Unknown:
And 1 of the interesting pieces of the overall good data platform as well is the fact that you have introduced a different interface for being able to define the logical models using the MAQL or multidimensional analytical query language dialect. And I'm wondering if you can give a bit of a compare and contrast between that and SQL and some of the benefits and additional functionality that MAQL provides.
[00:43:06] Unknown:
Yeah. So MaCl, stands for multidimensional analytical query language. And this is Good Data's proprietary query language for creating metrics or aggregations of the underlying data that is defined in that semantic layer. This difference for this differs from SQL in that it's, a streamlined analytical language with less code to write and maintain. So less technical folks, find MacL easier to use, and we found that anyone familiar with SQL is able to pick up, MacL very quickly. The key advantages that I would state about MacL is, 1 working with a good data platform. It works out of the box. It's something that is innate in our platform. It's also multidimensional.
So this, pairs very well with our semantic layer. And going back to what I was saying about less code, what happens is there are no joins or subjoins in Makl queries that, are are stated to define a metric because it works on top of the logical data model. So these queries are already context aware. And talking about the context aware piece with, everything being semantically related through the logical data model, any metric can also be, immediately used for, reporting and can be reused again. So this is something that, can be utilized for all of our clients. It doesn't need to be rewritten. It can be, precomposed and then used by, 1, 000 or tens of thousands of users across their reports.
There's also the, composability of these metrics. So you could put in nested metrics, build foundation metrics. That way, when you drag and drop this into your analytical designer interface, you could apply different filters and all that kind of stuff. So there is that capability as well. And I would say the last piece about, a key advantage of Makl is the resiliency. So a lot of the times when there is some, like, source to target mapping of, that will require some sort of significant refactoring of the actual data model, This sits exactly on the semantic layer. So there there really doesn't have a a serious impact on the existing metrics or reports, unless, of course, there was something that, like, a serious LTM change was made on the front end that needed to be released and rolled out. But I would say, in general, the benefit of Makl is, less code. It's context aware because of the semantic layer.
Composing metrics and reuse is very easy and the resiliency.
[00:45:36] Unknown:
And in terms of good data, what are the cases where it's the wrong choice and someone might be better suited using a vertically integrated internal platform or building out their own analytics solution for exposing to end users?
[00:45:50] Unknown:
Yeah. So if you're seeking a single static data visualization for your management team or maybe a public chart for your website, good data is will be too complex of a platform for you. In this case, using data visualization tools or libraries would be better suited. However, if you plan to do anything more than just a simple static data visualization, you would need to find something that's more reliable, that does have this life cycle management, pretty much what good data has.
[00:46:21] Unknown:
And what do you have on the road map for the future of good data in terms of new capabilities or just overall improvements or new use cases that you're looking to provide?
[00:46:32] Unknown:
Big data continues on the trend of analytics everywhere for everyone, and this includes improvements on data integration options, bringing better data visualizations on the front end that is available, through analytical dashboard, bringing better collaboration between data engineers and analysts, and improving the self-service analytics ease of use for these nonanalysts.
[00:46:55] Unknown:
And to add to that, I think another another component of our our road map that's really exciting for us is we're working on a a newer Kubernetes based deployment option of GoodData. Right? And this this really will help us enable collocating the analytics with a SaaS application that may be deployed, say, in a public or private cloud platform or in a a local on prem data center. So the goal is to enable the same functionality that when we get out of the cloud hosted GoodData platform, maybe for companies that need more enhanced control or stricter guidelines or just want to feel like they have the full ownership, and it's less of a managed service that we are providing to them. Are there any other aspects of the space of embedded analytics or
[00:47:46] Unknown:
the product that you're building a good data or anything else in the business intelligence and analytics space that we didn't discuss that you'd like to cover before we close out the show? I I think that we touched on a a lot of the the key aspects and and the reasons why we think that good data has
[00:48:01] Unknown:
a a competitive advantage. So I I think I think we're from from my perspective, we we touched on a lot of things I I hope the larger audience would find, useful and and that we kind of present, like, information from the the good data perspective,
[00:48:20] Unknown:
as well. Alright. Well, for anybody who wants to get in touch with either of you and follow along with the work that you're doing, I'll have you each add your preferred information to the show notes. And as a final question, I would just like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. Yeah. I I mean, that's
[00:48:39] Unknown:
that's always the the big question, right, is is what's next? Where where are we heading? I think from you know, I see that there could be improvements in terms of, the data cleansing, area of, data management and and data engineering. Right? I I I know, you know, personally, a lot of people, including myself, spend a lot of time debugging or cleaning up datasets or trying to to ensure that, you know, the data is clean enough to run n 10 through ETL. And, you know, part of that is is error handling. Part of that is, you know, maybe removing records, right, which could lead in some type of inconsistency.
Right? So, you know, the area I would like to see development on would be some, you know, very customizable data cleansing solution that has a very flexible integration with other analytical tools. You know, a a drag and drop, similar to how we mentioned earlier, you know, a pluggable UI framework. Right? Is there a way that someone could build a solution that would integrate directly into our native technologies? And we could customize that and and and deliver that as a a packaged option to our customers as well, and really limit the amount of time that it takes to process and handle all all of the the data and and troubleshooting and limit the troubleshooting and hopefully free up time for more value add activities.
[00:50:12] Unknown:
To add to that, I would say, not necessarily a big gap, but a big change that we would probably see in the future is as more users are getting access to tools where they have access to data. Maybe they didn't have access before as Phil was mentioning in a use case earlier. We're gonna need improvements to make semantics, and relationships in general in data a little bit easier to understand. So even though we do have, like, a semantic layer and other tools may have something similar to make it easier for, for their end users to actually utilize.
I predict in the future that we would need to simplify this even further for a wider audience.
[00:50:51] Unknown:
Alright. Well, thank you very much for both taking the time today to join me and discuss the work that you're doing with good data and empowering embedded analytics for end users and making the overall analytics space more accessible. It's definitely a very interesting product, and I had a lot of fun learning about it as I got prepared as as I prepared for the show. So thank you both for all of the time and energy you put into that, and I hope you enjoy the rest of your day. Yep. Thank you, Tobias. Thanks for having us today. Yeah. Feel free to share our contact information. We're happy to communicate offline with anyone who has any open questions or concerns or follow ups that are needed.
[00:51:27] Unknown:
Yeah. But we appreciate your time as well. Thanks. Thank you so much, Tobias. Thanks for having us. Bye.
[00:51:38] Unknown:
Listening. Don't forget to check out our other show, podcast.init@python podcast.com to learn about the Python language, its community, and the innovative ways it is being used. And visit the site at data engineering podcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Project Announcement
Guest Introductions
GoodData's Origin and Mission
GoodData's Unique Selling Points
Use Cases and Industry Applications
Trends in Business Intelligence
APIs and Developer Integration
GoodData's Architecture
Integration Points and Customer Examples
Data Flow and Performance
Lessons Learned
MAQL vs SQL
When GoodData is Not the Right Choice
Future Roadmap
Closing Thoughts and Final Question