Summary
Controlling access to a database is a solved problem… right? It can be straightforward for small teams and a small number of storage engines, but once either or both of those start to scale then things quickly become complex and difficult to manage. After years of running across the same issues in numerous companies and even more projects Justin McCarthy built strongDM to solve database access management for everyone. In this episode he explains how the strongDM proxy works to grant and audit access to storage systems and the benefits that it provides to engineers and team leads.
Introduction
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute.
- Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
- Your host is Tobias Macey and today I’m interviewing Justin McCarthy about StrongDM, a hosted service that simplifies access controls for your data
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by explaining the problem that StrongDM is solving and how the company got started?
- What are some of the most common challenges around managing access and authentication for data storage systems?
- What are some of the most interesting workarounds that you have seen?
- Which areas of authentication, authorization, and auditing are most commonly overlooked or misunderstood?
- Can you describe the architecture of your system?
- What strategies have you used to enable interfacing with such a wide variety of storage systems?
- What additional capabilities do you provide beyond what is natively available in the underlying systems?
- What are some of the most difficult aspects of managing varying levels of permission for different roles across the diversity of platforms that you support, given that they each have different capabilities natively?
- For a customer who is onboarding, what is involved in setting up your platform to integrate with their systems?
- What are some of the assumptions that you made about your problem domain and market when you first started which have been disproven?
- How do organizations in different industries react to your product and how do their policies around granting access to data differ?
- What are some of the most interesting/unexpected/challenging lessons that you have learned in the process of building and growing StrongDM?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- StrongDM
- Authentication Vs. Authorization
- Hashicorp Vault
- Configuration Management
- Chef
- Puppet
- SaltStack
- Ansible
- Okta
- SSO (Single Sign On
- SOC 2
- Two Factor Authentication
- SSH (Secure SHell)
- RDP
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello. Welcome to the Data Engineering podcast, the show about modern data management. When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy them. So check out Linode. With 200 gigabit private networking, scalable shared block storage, and a 40 gigabit public network, you've got everything you need to run a fast, reliable, and bulletproof data platform.
[00:00:33] Unknown:
If you need global distribution, they've got that covered too with worldwide data centers, including new ones in Toronto and Mumbai.
[00:00:39] Unknown:
Go to data engineering podcast dotcom/linode today to get a $20 credit and launch a new server in under a minute. And go to data engineering podcast.com
[00:00:47] Unknown:
to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. And don't forget to go to data engineering podcast.com/chat to join the community and keep the conversation going. Your host is Tobias Macy. And today, I'm interviewing Justin McCarthy about StrongDM, a hosted service that simplifies access controls for your data. So, Justin, could you start by introducing yourself?
[00:01:08] Unknown:
Sure. Hey, my name is Justin. I'm 1 of the co founders of StrongDM. And, like you just said, we build, we build an access control product for databases and servers.
[00:01:17] Unknown:
And do you remember how you first got involved into the area of data management?
[00:01:21] Unknown:
Yeah. So data management, I think, has been, what's a it's not lifelong perhaps, but I guess career long, area of involvement for me. This was never really by choice, initially, but, I think it became a choice over time. I was just sort of always the 1 on the team that was, interested in building it and capable of building it, whatever it was. And so, yeah, definitely, many, many data warehouses, many, many databases built over the years, for a lot of different companies and for a lot of different reasons.
[00:01:52] Unknown:
And given that background and your experience of building out all these different data platforms, I'm wondering if you can talk through how that led into your inspiration and motivation for creating strongDM, and a bit about what the product does and the problem that you're trying to solve with it?
[00:02:10] Unknown:
Sure. So so I would say it's a pretty classical, pretty classic founding story of, scratching a niche, for me at least. So fundamentally the product is a protocol aware proxy with an embedded credential repository. So we, sit in front of data stores and servers and we grant users that need access to those data stores. We grant access by proxying their requests. And so that came about because, actually, while working on a previous product, we were faced with a situation of basically just too many credentials. So in this case, it was, too many credentials to too many sensitive data stores that, that were necessary for the operation of of the product but just didn't didn't feel right.
It felt like there were there were too many secrets that were too easily accessible and needed to be too widely distributed, to, too many people. So just the the feeling that, that it needed to be more consolidated, more consistent, and and more auditable. And then and then, that it also needed to be more convenient. So that that was the original impetus for the product.
[00:03:25] Unknown:
And as far as the sort of many to many matrix of credentials to users to services, I'm wondering what you have found to be some of the most common challenges as far as managing and granting access and authentication for those different storage systems.
[00:03:44] Unknown:
So I would say by far, across all the customers we talk to and across, across our personal experiences, by by far the biggest challenge is, this trade off between convenience and security. You're either too locked down, in which case, people can't get their jobs done, or, you're too open, so you've over provisioned access and then you have interns your summary intern that doesn't really need access to, you know, the transaction details on a particularly sensitive, area of the dataset, but but you haven't, carved it up in a way, to really grant it precisely. And so I I would say that that trade off and that tension exists for, near as we configure every team out there.
[00:04:27] Unknown:
And given the number of different storage back ends that you're supporting and the varying levels of native capabilities for being able to do any sort of access control or filtering. I'm wondering how much of that you've had to incorporate into your proxy layer and whatever facilities you build in to make it easy to have a fairly well scoped role definition for people who are onboarding onto your platform.
[00:04:55] Unknown:
Yeah. So the the thing that, I think has made our product interesting to work on and and makes, makes it interesting to sort of watch, watch customers using it, is actually the heterogeneity of sort of the modern data environment. It's it's very rare that you have only 1 database type. Right? But each of the database types and each of the data environments have very different, ways of expressing permissions. So in our product, what we do is we we rely on the underlying permissions engine. So, you know, the the grants that you issue in Mongo or in Microsoft SQL Server or in, or in MySQL, all of those are still in place and you still rely on those.
Once you have created a good permission set that you're happy with, our product then makes it really easy to distribute that permission set to every member of your team, based on their role, or even based on things like, time bound access or something like that. So, so you can, very confidently and, like, with a with a very low, chance low likelihood of making a mistake, grant, exactly those least privileges to the team members. And, of course, all that is through roles that are often defined, in an existing SSO or some other kind of directory. So I think the the heterogeneity through the through the role based model, getting it all working and then, and then making it easy to grant, I think is is pretty much the core of what I enjoy about deploying the product.
[00:06:26] Unknown:
And there are other potential combinations of tools or approaches that could be taken most notably in my own experience, things like the Vault project from Hashi Corp as far as being able to create these role definitions and then generate credentials from that. But it can also be a fair amount of complexity in terms of managing access to those different role definitions. And I'm curious along those lines what you have seen as far as common workarounds or common setups for people who are trying to implement their own homegrown solutions to the same problem that you're trying to solve with strongDM?
[00:07:03] Unknown:
Sure. Yeah. I think, what we see out there in the wild is, we do see a lot of Vault as you mentioned. We see a lot of, I would say, pretty standard use of, of the configuration or registration systems for, you know, the the chefs and puppets and ansibles to actually execute some of these creation, of roles and users and credentials. So that that's all pretty common out there. I think what, the the actually, at the beginning of our product, it sort of started from, what we wanted the end user experience to look like when they actually reach out to access, the database. So, so if you can think of an analyst and a finance team, they they just know that they need to access the financial transaction database, and it's maybe stored in Microsoft SQL Server.
They don't particularly care whether it's hosted in Azure or AWS or on prem somewhere. They just wanna click on something that they have a logical name for, like the transaction DB and, and be granted access. So because our vision of the product started from that end user experience, It's actually the last mile that tends to be the differentiation. So, so whereas it's very possible and appropriate and, a great idea to automate the creation of your role definitions using, some of those tools like Vault or or really any of the, configuration management systems. That's a that's a completely appropriate thing to do.
Handing that credential securely to an end user, I think, is the part that, there's no workaround that we found, for the full end to end proxy that that we've
[00:08:46] Unknown:
created. And so for that end user experience, I'm wondering if you can just talk through the overall workflow of somebody going into your platform, requesting access to a given system, how that access is granted and audited and controlled, and how the actual connection is established, whether they are given back a set of credentials that they put into their database access, GUI or command line shell that they might be using or if they access as proxy directly through your platform or how that all works?
[00:09:18] Unknown:
Yeah. Sure. So there's a there's a central, there's a central API and a central system that, is the container of essentially all of the role definitions. So Justin, as a member of the engineering team, is logically granted access to data source a, b, and c. Right? And those data sources, 1 might be Redis, 1 might be Redshift, 1 might be, DynamoDB. So, so I logically have been granted those permissions in the system. Practically, as an end user, when I go to access it, I'm accessing it through a local client. So that local client has a GUI component if you happen to be using a GUI or there's a command line component if you prefer the command line. And in essence, what it is doing is it is creating a local proxy that then forwards your queries, through, we we say horizontally, through a series of relays to the destination database.
So, locally, if I'm if I need to access either, let's say, in a in a query tool, or from code, I simply update my data source connect string to point to local host. And then the credential that I'm actually using locally is completely disconnected from the final credential that is authenticated into the system. And the way I authenticate locally is actually through typically through the single sign on provider or other directory system that I'm using, on the team. So so I issue a local login that bounces me over to maybe, you know, let's say Okta or Google, for SSO. I'm authenticated in, and then essentially, the system has created a tunnel that, knows that Justin is the 1 accessing it. But by the time it gets to the database, it might be, you know, read only user 7 is the underlying credential name.
[00:11:02] Unknown:
And then as far as the audit trail, is that entirely managed within strong DM, or do you also rely on, server logs from the different back end systems for being able to trace the interactions of the user with the back end system in the event that there's some sort of breach, or you're just trying to do maybe compliance auditing for making sure that all of your systems are being accessed in a manner as defined by whatever compliance policy you're adhering to? Yeah. So what we found there is that,
[00:11:33] Unknown:
the all of the database and data management systems have very different capabilities in terms of what they log, and very different costs in terms of what they log. So, so for example, flipping on logging in your highest throughput production database will definitely compete with your performance on that production database. So what we've done is we we've actually taken all of that on. So every query or in the case of servers, every SSH keystroke, as you're, as you're using something you've granted access to, all of that is sent into an archival log, that is is obviously encrypted. And, essentially, if you need to retrieve a particular moment or a particular query from a user's interaction history with that data source, then it's it's directly available in the strong AM tool. So there's and there's, essentially a command line utility to extract that.
And then you can pipe that into whatever system you're using for compliance or for security alerting. All all of those things are very common deployment scenarios.
[00:12:41] Unknown:
And as far as the, types of industries or users or scales of organizations that you have brought on as customers. I'm curious what the commonalities are in terms of the needs that bring them to you, whether it's because they're scaling beyond a certain point and they need to be able to have a clear way of managing access, or if it's a lot of large organizations or financial or health care industries that need to be able to be compliant with their industry standards. Just wondering what the sort of tipping point is for when somebody should be starting to think about a product like StrongDM to suit their needs.
[00:13:22] Unknown:
So there are a few triggers, and a few characteristics that make a a team, really well suited, to the problem. So first is definitely team size. If you're, if you're an individual or you've got a team of 2, then you usually don't have a an access control problem. The moment your team size gets up to gets up to the dozens, then essentially you always have someone that's leaving or joining the team. So you're always onboarding or offboarding. And then, again, it's rare that you only have 1 database. Right? So you have, maybe half a dozen or even a dozen, data sources that an engineer or an analyst might need to touch in their in their day to day, even when your team size is just a few dozen.
So, so the that's sort of the the floor of team size. The other pattern that we see is basically just a function of, the data sensitivity. So the moment, something about your business is is objectively sensitive. So if it deals with financial data, if it deals with health data, anything that you wouldn't want tweeted out, that tends to, be a really strong indicator. And then, I'll also say, maybe 2 other ones that come to mind are just hypergrowth. So so not just when your team size reaches a couple dozen, but, if your team size is growing by, you know, a 100 or 200 every year, then simply the con the convenience of having a a rational way of granting and revoking access that works across lots of different skill sets and works for users of every type. That alone is, is a perhaps a reason to, to upgrade your process. And then the other 1 is, of course, compliance. So, I would say a significant fraction of our customers rely on evidence that's collected by StrongDM, for example, in their SOC 2, compliance process, as well as other, compliance regimes like HIPAA. And as organizations
[00:15:14] Unknown:
and teams are onboarding into your system, I'm wondering what you have found to be some of the most interesting or clever or, obtuse workarounds that people have come up with for being able to manage these access control patterns within their own teams before they go to a more unified solution like StrongDM?
[00:15:34] Unknown:
Sure. So, you know, 1 thing I think that, when you're when you're deploying a product that, if you think about it, that's that's as sensitive, that touches as sensitive an area as any kind of access control product. So I'm sure if you're just deploying a a single sign on product, it's a similar story or a password manager, it's a similar story. But what you you find is that actually there's a lot of diversity in terms of how, how complete and how, security conscious some of the existing practices are. So what I find is that we we end up, somewhat playing the role of like a almost like a trusted physician and we have to we have to basically say like we've seen it all before.
And so that I will say that, yeah, there are I think after a complete deployment is, I I'm actually a little bit shy to get into some of the some of the things we've seen before a complete deployment. But let let's just say that there there are there's always room for improvement out there.
[00:16:34] Unknown:
Yeah. As somebody who works in operations and has been, sort of put into situations that necessitate various hacks, I I can definitely relate to that and some of the, I guess, unwillingness to publicly display, what is necessary in various situations.
[00:16:51] Unknown:
Yeah. And and actually that goes back to, actually some of the original catalyst for the product, is just realizing, you know, when you're distributing a credential for a data store, you've got a host name, you've got a port number, you've got a username, you've got a maybe some sort of data catalog indicator. You've got a you've got a username and password. You end up, you know, texting half of that to the person who needs it, sending the other half over Skype, Maybe putting some of it in 1. And so like, and if you can just imagine that process being repeated the world over in every time someone might like and this is certainly what I was doing when I when I was doing that. I was just shaking my head and thinking this this feels silly.
[00:17:32] Unknown:
And in terms of the areas of authentication and authorization and auditing that are necessary for a well managed solution for controlling access to data sources. What are some of the areas or aspects of that overall problem space that are most often overlooked by teams who ultimately end up working with strong DM? Sure.
[00:17:55] Unknown:
So 1 1 thing I would say is that, you know, in the in an era where, where a shop could be wall to wall Microsoft or wall to wall Oracle, it was a solvable problem to have an expert on the team member called the DBA, that actually had expertise in the full capabilities of the database engine. So in 2019 and beyond, that seems less and less possible. Each data each data storage engine, has, has its own way of granting, its own way of guarding, its own way of guarding a table or a column, its own way of restricting a row. And actually having an expert in each of them on the team seems like a scenario that, that we don't encounter very often. So I would say it's very very common to, find a data store where the grants have been sort of good enough. But then if you need to create a new database or you can need to create a variant, like, someone has to go and recall all of the permissions, statements that were for that particular, you know, for DynamoDB for example or for or for that particular Mongo variant. And it's in that it's in that process that I I feel like a lot, of not not negligence, but, like, just it's it's a forgivable and understanding reality that, that it's hard for anyone to keep in their head, how to do a a good grant in Postgres and Microsoft SQL Server and Mongo, because they're also different. So I would say that's that's a very common situation out there. It's just through through the diversity and heterogeneity of the data engines, you end up with sort of inconsistent quality and inconsistent, completeness in granting and what that what that that that tends to air on the side of over granting. So that tends to give people write access where you don't want them to have write access or read access where they shouldn't have it just because it's impractical
[00:19:54] Unknown:
to tailor it any finer than that. And also particularly for a number of these newer storage systems that aren't as full featured as some of these relational databases or some of the systems that are much further along in their level of maturity is that they don't even have an appropriate level of granularity for access control, whether it's just because that was never the intent for the way the storage system was designed to be used, or just because actually implementing those controls are too difficult given the way that the information is stored on disk and things like that. So, I I imagine that there is a fair amount of cases that you see along those lines too where people might be using particularly things like Redis, where it's just a key value store and there isn't really a lot of granularity that's even possible in terms of the types of access that somebody can be granted. Yeah. Absolutely.
[00:20:42] Unknown:
And, actually, a a couple of other data stores, come to mind, just where the original sort of founding moment of the product, was obviously before the modern era of data science and was before the modern era even of sort of the current conversation we're having around privacy. And, you know, if I'm if I'm designing Hadoop, if I'm designing, the original few moments of Mongo, you know, I'm I'm thinking in a very different space. Although I do wanna give a shout out to Mongo, where they started with very limited access controls. Their modern, access controls, I feel like they're they're really well thought through.
So so it is interesting also to see that some organizations have been able to improve their story, significantly.
[00:21:22] Unknown:
And along the access control capabilities for these underlying systems and because of that level of variability, what types of additional capabilities do you provide at the proxy layer for being able to implement some of these access control and auditing capabilities that aren't necessarily supported at those underlying systems?
[00:21:43] Unknown:
Sure. The 2 that come to mind are, are first, being able to, introduce things like, time based access control, temporary access. This particularly comes into play when we work with engineering teams that support a production product. So inevitably at some point, even though, you nominally don't need access to sort of the live data, there's always a moment where the only way you can diagnose or fix a problem is by sort of jumping into the into the deep into the center of the system. Because you need to have access to that for availability reasons, for, customer support reasons. You need to be able to do it, but obviously, it should be very rarely and carefully granted. So So 1 way to think of it is, you have a support ticket. It's something that can only be diagnosed or fixed in production. Well, your access to that production data store should exist for exactly as long as that support ticket exists. Right? And so that's, an element that would never exist, in any of these underlying data stores, but it's sort of a natural byproduct of accessing them through a proxy. The other 1 that comes to mind, is, and this is an interesting thing, interesting question when you start think about data breaches. So your your duty to disclose data breaches oftentimes is a function of the forensics that you can gather about it. So, so if you had a compromised workstation at some point and theoretically that workstation had access to 10 databases, but you had no evidence about what was accessed, then you may find yourself with a duty to disclose a whole lot of of about a data breach. Right? Because you sort of couldn't prove otherwise, that the data wasn't breached. It may be that that workstation was just compromised to, mine some mine some coins of some type. And they were never interested in the in the data. Right? And so the proxy approach because just because everything is logged, you can say very crisply and very clearly that although in in in principle, this workstation was was compromised, in in fact, no data was ever transmitted down to it. And do you provide any sort of
[00:23:48] Unknown:
masking capability of the data that is retrieved so that, for instance, if somebody is querying a database that has potentially sensitive information, but they only actually need to care about the user ID and time stamp without actually getting, you know, the the user address for instance? Or do you just rely on the native capabilities of the database as far as how the, permissions can be granted at either row or column level, if that's something that is
[00:24:15] Unknown:
available in that storage system? So, so this is actually, this answer is gonna sound, I don't know. Maybe this is a controversial answer, but but this is actually a really important area, of product, design for us. Because obviously being able to do data masking in a general way is a very appealing prospect. And it's easy to think about, gosh, I I would love to be able to look at a list of from a table that maybe contains Social Security numbers, but I would love to never see the Social Security numbers myself. And it's it's easy to want a proxy tool to do the masking for me. What we found in our experiments with, implementing systems like this is that, especially in a heterogeneous environment, especially, in an environment where databases themselves have a lot of language and a lot of capability and, where you have full access to SQL or full or sim or substantial access to JavaScript, or or Lua, depending on which, which database you're talking about. It's far too easy to hide data from the masking engine. So it's far too easy to, you know, for example, embed even, you know, embed a number in a string and then all of a sudden it's gone in terms of the the semantics of being able to mask. So currently, our products, does rely on, thoughtful restriction in the underlying permissions engine rather than trying to mask it after the fact. And, just there's a ton of ways you can imagine if, hiding your data from a masking engine. Right? And the other thing too is that particularly
[00:25:44] Unknown:
if the database is part of an active system that's seeing constant evolution, then it's just a whole another level of effort required to even keep up to date with what fields are in the database Mhmm. Let alone whether or not they need to be masked and having to maintain a consistent data catalog of all of those different fields and what the semantic meaning behind those fields are in order to be able to have any understanding of what types of operations need to be performed to protect sensitive information unless you're deploying some sort of natural language processing or or machine learning project embedded in the proxy to try and do some sort of on the fly intelligent analysis of the data as it's being returned over the wire.
[00:26:22] Unknown:
Yep. Yeah. Exactly. On the other hand, we do provide hooks for, for example, make it really easy to alert on, an access of a sensitive field. So if in your particular database of choice, you know for sure that there that this column should never be involved in any query and you can express that in some way, in some automated way, then then we can give you hooks to help out with that. So right now, it's sort of make the audit data ubiquitously
[00:26:48] Unknown:
and quickly accessible, and then rely on that to feed that into an alerting system. So that that's that tends to be how we're handling that. And then as far as the actual proxy layer and managing access to these underlying systems, what have you found to be some of the most useful strategies and the most challenging aspects of being able to maintain communication with all of these different engines, particularly as they have new product releases and possibly change some of the semantics or APIs that are available to accessing those types of systems?
[00:27:22] Unknown:
So intuitively and in and actually, this is definitely 1 of the main areas of of maintenance for our product. I will say 1 beneficial reality of the way data stores evolve is that they they don't change as fast as JavaScript frameworks. They, their customers, expect a degree of stability. And so usually, they essentially through release notifications give give us heads up, that we are gonna need to, for example, introduce a new, a a new aspect of the authentication protocol. And so we are able to comfortably adapt to those, you know, sort of on the schedule of the underlying data stores. Thankfully, also, even though each data store protocol is a different species, they all have shared characteristics. So they have some similar morphology. So, so you've got, you know, a giraffe and a mouse can be compared, and we see that a lot in the in the in these protocols. So we see the authentication phases. We see the cryptographic primitives that are in use. We see the ways to demark a request from a response and to, for example, you know, page through result sets. Those those have similar characteristics. And so even though, we end up implementing very custom code to handle MySQL, if you sort of go up a a level of abstraction to it, it has a similar shape to, you know, what the code looks like for,
[00:28:46] Unknown:
Oracle. And going back to the point that we were discussing as far as managing access to specific locations in the underlying data, do you have any built in support for having stored queries that are commonly used across the team for maybe debugging sessions or for being able to do quick out of box analyses, or do you rely on higher level systems that interface directly with strong DM for being able to manage some of those stored queries and stored access patterns?
[00:29:18] Unknown:
This is on all of our wish list. Like, if, if we had unlimited time and resources, we'd we'd love to be able to have a a shared pasteboard, for for common queries and, you know, little little minimal query clients, for for each of the data stores just, because I think 1 1 aspect of working in a system like this is it changes your point of view. It's no longer are these just a bunch of sort of host names in a in a list somewhere. They they feel more like, you know, logical abstract data sources that you have access to and they have the same name and icon for all the members of your team. So I don't know. Somehow it feels more social and and friendly when you're using it. And so, so you definitely reach for that, like, you know, like, I just wanna copy this query over to your workstation. None of those gestures exist in our product today. So we do we do rely on higher level tools. But but I love that idea and, and, hope you'll give me permission to build it.
[00:30:11] Unknown:
And for somebody who's onboarding onto strong DM and maybe has their own preexisting access patterns and access control mechanisms, what is the overall process of getting set up with your platform and getting it integrated into their systems and their systems and their environments?
[00:30:29] Unknown:
So the the the main step, is is essentially deploying the proxy itself. So so there's a client that runs on your workstation. It's It's a GUI if you have a GUI operating system. It's a CLI like I mentioned before if if you prefer the CLI. And then that client has to talk to something. So it talks to the next hop in the proxy network, which we call a gateway. Then what it's doing is it's actually multiplexing all of the connections from from your local, database client, from your local database driver if you're running code. It's multiplexing all of those through, 1 TLS connection, that hits the gateway, basically. So you need to find a place for that gateway that has ingress from where that client needs to go and then has, accessibility to ultimately to the target data stores. So a common place to put that might be, adjacent to whatever bastion or jump host you have on your network. So you put our gateway there. If you had an interior subnet, you could have another, hop. You you could have a relay, so you can actually chain these proxies together arbitrarily deep. And once you've got that set up, it's just a matter of importing your data source definitions. So you do that through the CLI. You can do a bulk import if you have 100 or or thousands of databases.
You can just click around in the interface and add it. And then, really just in a few minutes, you can be, you know, if you've got a single sign on in place then, boom, your whole your whole team, can be on. They can be clicking, pull the list of data sources then drop down. They can click on the data source they know and love and use every day and and they're off to the races.
[00:32:00] Unknown:
And in the beginning of building this product and embarking on the process of building a company around it, there are a certain set of assumptions that you have in mind that you are looking to fulfill. And in the process of building the product and growing the company and interacting with your customers, I'm curious which of those have been disproven and which have held true in your overall journey through building this company.
[00:32:28] Unknown:
Sure. I I definitely have an answer for that. And it's actually, I'm not sure if it's I'm not sure if it's been disproven or if it's held true. I think it's I think maybe they're both sides of the same coin and that is and that is about what data stores people actually tend to use. So I feel like, a few years ago, we had a really interesting bloom of innovation in, in data stores. So we had a lot of new open source databases that were created. Some hybrid open source and some pure commercial new data stores that were created. And I feel like, I I feel like, you know, starting to shake out now and we know which ones are sort of long long term, and which ones were maybe experiments that that that ended. But what what I'll say is, at the time that we were founding the company, it wasn't clear how much staying power, I guess, the really the traditional sort of usual suspects would have. So it turns out Postgres isn't going away.
And it turns out, there's still still a lot of reasons to use MySQL as well, you know. At at the same time, it also turns out that like, there are a lot of shops out there that continue to love their Microsoft SQL Server. There are certainly, more modern, data stores that are that are heavily in use. But depending on their degree of specialization, they might be in use, for a specialized part of the business model. And perhaps the data converges onto something that that has a more traditional data model. So it might converge into Redshift, for example, even if, even if some specialized aspects of the system are in are in a more exotic data store. So I would say, yeah, in in summary, adoption of data stores, how it's actually shaken out, I think that is that is both. I'm I'm surprised and not too surprised at how it has
[00:34:09] Unknown:
And in terms of the authentication layer, I know that you said in some cases, you can delegate to single sign on systems, But I'm wondering if you have any native support for being able to integrate with things like 2 factor auth, or if you rely purely on the SSO systems that might be deployed within these organizations to manage that layer of access?
[00:34:30] Unknown:
Yeah. We do have, we do have native 2 factor. This is 1 of those cases where I would say, I think we found our way to this feature through dogfooding the product, more than even getting, requests from customers. There are certain data stores that are just so sensitive that that you want you want maybe even the sort of nuclear launch code style 2 key situation where you need 2 individuals to access them. And so a a lesser version of that is certainly the, being able to designate a data store as, as MFA required. So in our clients, and, again, this works across all the data sources you have access to, the client will idle out and it'll require an MFA. Depending on what you're accessing, it might require it for, you know, a short session or a long session. There are all sorts to be requested right then for that multifactor. So to be requested right then for that multifactor.
So it it it gives me a warm feeling every time every time it happens to me. And, the prospect of implementing it, sort of in the native configuration of Mongo and Redis and SQL Server and Postgres, I I would not look forward to that task. So the so I I'm definitely a happy customer.
[00:35:43] Unknown:
And in terms of the different types of organizations and industries that you work with, what have been the differences in the ways that they react to your product and the capability of being able to grant access to these production systems and the overall approach that they have in terms of the policies that they maintain around granting access to data?
[00:36:07] Unknown:
So a lot of, the reaction, depends on, let's say, where, the particular org is in their maturity cycle, particularly around, compliance. So they'll be at some point in their maturity cycle on security and then sort of separately on compliance. Anyone that has any compliance burden will recognize the sort of immediately relax, when they see how accessible, how fingertip accessible all the proof is. Because when you go through that quarterly or annual audit, the, you know, the test from the auditor to show me when Joe Schmo was granted access to this data source. The fact that it's right there, you can get it, and you can show on a network diagram that there was no other way that Joe Schmo would access that news or, it's a huge relief. So, really, just any anyone that has a compliance regime in their industry, immediately responds to that. I would particularly call out like the, Fintech companies. They very much, very much know that know that routine and are very, very much appreciate the, the the how how the product brings it together. And in terms of your experience
[00:37:13] Unknown:
of working on this product and working on building out the organization around it and working with customers, what have been some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:37:25] Unknown:
Sure. So, you know, what comes to mind is, and I would say this is a, this was a surprise, the degree to which this was true, but it's also sort of an affirmation, makes makes me feel good about about humanity. So it's the virtuous cycle of working on, in our case, serious software. So, but I think this applies anywhere. If you are working on something serious, it tends to attract the type of talent that like to work on serious problems. And if you're working on something serious and of high consequence, then you tend to bring your a game. And then and then people that are using whatever your product is tend to see that, in your product. They tend to see that you approach it with seriousness and then they tend to be tend to accept the idea that maybe they're happy to pay you for it. And then, because they're paying you for it, you can give it the kind of support and nurturing.
So for example, instead of getting a form response back when you email customer support, you get an actual intelligent human that understands the product and that has empathy for your question. And so then that reinforces the back to the back to the the product definition itself And, and, again, attracts the kind of talent you need to get to the next phase. So the the virtuous cycle of serious software.
[00:38:42] Unknown:
And looking forward, what do you have on the road map for the next phase of that cycle?
[00:38:48] Unknown:
Sure. Well, you know, we've been talking mostly about data stores, but really the StrongDM product is a proxy for for really any system that you might need to access that, has sensitive attributes. So many of our customers every day, the only way they get to their servers is by using the SSH component of our product. So every keystroke they're typing in their in their production or their staging or their development environments is is through our proxy and all of those keystrokes are recorded. If you're a Windows shop, there's a different remote access protocol. It's called remote desktop. So, also, every day, folks that are managing dotnet, installations or managing a SQL Server database. The only way they get to those destinations is by using RDP through our system. And when the when it comes time to do an audit, we can actually replay, the video of all of those Windows remote desktop sessions. Right? So if you basically think along those lines, every technical employee accessing any technical system, how do you right size their access, not over provision or under provision them, role based? And then how do you capture and report on all the evidence you need, to audit it? That that that is the that's the vector that the product is on. And are there any other aspects
[00:40:03] Unknown:
of the strong DM product or database management or data access or system access controls that we didn't cover yet that you'd like to discuss further before we close out the show?
[00:40:14] Unknown:
No. I I I think you got it. I, I I think you asked all the right questions and, yeah. I hope you have a chance to use the product because, because I know you'd have to pry it away from you through from my, my cold dead fingers as a as an everyday user.
[00:40:30] Unknown:
Yeah. As somebody who works in operations, I can definitely recognize a lot of the benefits of your product, so I'll certainly be taking a closer look at it. And for anybody who wants to follow-up with you and get in touch and keep up to date with the work that you're doing, I'll have you add your preferred contact information to the show notes. Great. And as a final question, I'd like to get your
[00:40:50] Unknown:
on what you see as being the biggest gap in the tooling or technology that's available for data management today. Okay. So I'll say we we sort of touched on this earlier, when we talked about sort of the design of data stores, in some cases, not contemplating all of the authentication and access models that would ultimately be required of them. I think there's a related but more focused version of that around privacy. I think it's no secret, since GDPR went into effect, and since, even, you know, you can see I believe there was a product released recently, but was it just yesterday from from Google on on actually dealing with privacy, in in data stores. So, I feel like at some point, some elements of how we handle data broadly, is is going to need to be pushed down into data management in a in a standardized and unified way. So that's I I would say that is a that is a topic. Like, I I don't have to advocate for that. I think the industry is naturally
[00:41:46] Unknown:
is naturally asking and answering those questions now. Alright. Well, thank you very much for taking the time today to share the work that you've been doing with StrongDM and for all of your efforts to help us get better access control policies in place for our data storage and our back end systems. So I appreciate that, and I hope you enjoy the rest of your day. Alright. Thanks, Subhas. Take care. Bye.
Introduction and Welcome
Interview with Justin McCarthy
Founding Story of StrongDM
Challenges in Access Control
User Experience and Workflow
Common Workarounds and Solutions
Time-Based Access Control
Maintaining Communication with Data Engines
Onboarding and Integration
Customer Reactions and Policies
Future Roadmap and Closing Thoughts