Summary
The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack
- This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold
- As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES.
- You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free!
- Your host is Tobias Macey and today I'm interviewing Max Cho about the wild world of insurance companies and the challenges of collecting quality data for this opaque industry
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what CoverageCat is and the story behind it?
- What are the different sources of data that you work with?
- What are the most challenging aspects of collecting that data?
- Can you describe the formats and characteristics (3 Vs) of that data?
- What are some of the ways that the operational model of insurance companies have contributed to its opacity as an industry from a data perspective?
- Can you describe how you have architected your data platform?
- How have the design and goals changed since you first started working on it?
- What are you optimizing for in your selection and implementation process?
- What are the sharp edges/weak points that you worry about in your existing data flows?
- How do you guard against those flaws in your day-to-day operations?
- What are the most interesting, innovative, or unexpected ways that you have seen your data sets used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on insurance industry data?
- When is a purely statistical view of insurance the wrong approach?
- What do you have planned for the future of CoverageCat's data stack?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Rudderstack: ![Rudderstack](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/CKNV8HZ6.png) Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack)
- Neo4J: ![NODES Conference Logo](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/PKCipYsh.png) NODES 2023 is a free online conference focused on graph-driven innovations with content for all skill levels. Its 24 hours are packed with 90 interactive technical sessions from top developers and data scientists across the world covering a broad range of topics and use cases. The event tracks: - Intelligent Applications: APIs, Libraries, and Frameworks – Tools and best practices for creating graph-powered applications and APIs with any software stack and programming language, including Java, Python, and JavaScript - Machine Learning and AI – How graph technology provides context for your data and enhances the accuracy of your AI and ML projects (e.g.: graph neural networks, responsible AI) - Visualization: Tools, Techniques, and Best Practices – Techniques and tools for exploring hidden and unknown patterns in your data and presenting complex relationships (knowledge graphs, ethical data practices, and data representation) Don’t miss your chance to hear about the latest graph-powered implementations and best practices for free on October 26 at NODES 2023. Go to [Neo4j.com/NODES](https://Neo4j.com/NODES) today to see the full agenda and register!
- Materialize: ![Materialize](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/NuMEahiy.png) You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing. Go to [materialize.com](https://materialize.com/register/?utm_source=depodcast&utm_medium=paid&utm_campaign=early-access) today and get 2 weeks free!
- Datafold: ![Datafold](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/zm6x2tFu.png) This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting [dataengineeringpodcast.com/datafold](https://www.dataengineeringpodcast.com/datafold) today!
[00:00:11]
Unknown:
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable enriched data to every downstream team. You specify the customer traits, then profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack. You shouldn't have to throw away the database to build with fast changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades old batch computation model for an efficient incremental engine to get complex queries that are always up to date.
With Materialise, you can. It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real time dashboarding and analytics, personalization and segmentation, or automation and alerting, Materialise gives you the ability to work with fresh, correct, and scalable results, all in a familiar SQL interface. Go to data engineering podcast dotcom/materialize today to get 2 weeks free. Your host is Tobias Macy. And today, I'm interviewing Max Cho about the wild world of insurance companies and the challenges of collecting quality data for this opaque industry and Max's journey through his career to get to where he is today. So, Max, can you start by introducing yourself? Hey, Tobias. Great to be here. I am the CEO of Coverage Cat, a startup that is focused on enabling people to get the right insurance by actually just knowing the price of insurance, which sound really dumb and really simple, but in fact, it's a enormous data challenge. And do you remember how you first got started working in data?
[00:01:56] Unknown:
Well, okay. So somewhat longer story, but, my first job out of college was at Microsoft, and it was working on what are the worst bugs in in Microsoft software, which sounds like it ought to be straightforward, but actually is awful as well. We built this just enormous catalog of Microsoft bugs, but the catalog was far too large for anyone to grapple with. So you just immediately run with the data question of, like, some VP asks you, okay. Which of these bugs do I need to go I mean, let let's let's just choose an extreme example. If you're running Internet Explorer, you have, you know, a 100 times more bugs than you have software engineers.
So you immediately run to the question of which of these bugs is worth fixing. And that's both a a psychological question in that some bugs make users much more angry than other bugs. And it's a, you know, it's a question of, like, how do I possibly keep track of all this? How do I even know which are the which are the real problems? And that was the first thing that I got into, and we got into that in the era in which all of the data was being stored on Microsoft SQL Server. And, let me assure you, Microsoft SQL Server has some opinions about how many millions of rows of data you should put in it for it to work well.
[00:03:11] Unknown:
I can imagine, especially at earlier in its history. I'm sure at this point, they've kind of lifted some of those opinions or expanded them a bit. But
[00:03:20] Unknown:
It was in the time that I was at Microsoft that we final finally transitioned from, like, a human who would log in to a server and, like, restart the database if it crashed. So that that was the era that I entered my career, and today, I'm sure nobody does. Well, I don't know. Maybe someone at Microsoft.
[00:03:40] Unknown:
You intern over there. Go reboot the server.
[00:03:45] Unknown:
The, the, when when when our, when when our engineers were on vacation, there would be no 1 who would go log in to the servers and delete the log files that would accumulate, so the servers would occasionally run out of hard drive space. So, I definitely got sent an email 1 morning saying, hey, Max. You need to go log in to these servers and delete some crap just so that we have room to keep operating. And we're like, oh, yeah. See, look. It's not just random startups. It's also Microsoft doing this kind of thing.
[00:04:15] Unknown:
Absolutely. Yeah. You mean you mean everything's not serverless now? They don't have disks. They just have the cloud. It's infinite.
[00:04:23] Unknown:
Well, I was reminded of the so you you might recall when, Twitter fired, whatever, laid off 80% of its people. So that there were, you know, bets internally about, like, oh, what's gonna go on? What's gonna break first? And I thought the most reasonable bet was, oh, some server somewhere is going to run out of disk space and no one's like, not all of this is instrumented. Not all of it's automated. And, I don't know that we'll ever find out. We'll never get the postmortem, on any of this, but it seems likely that's probably happened once or twice already. Absolutely. Or some,
[00:04:54] Unknown:
runaway process that has a known memory leak decided to consume all of the available RAM, and the server's just sitting in deadlock somewhere.
[00:05:02] Unknown:
Mhmm. And, I think people are like, oh, you know, you if you were building this right, it would never happen. But, of course, when you actually work at a real company, nothing is built. So, like, the the chief advantage of companies with humans is that you can have the humans patchwork all of this nonsense and then be the thing that is the fail safe when it breaks. Yeah. And there's always some other fire that's more important. You never go full automation in in any company I've worked in. Absolutely. Yeah. With any industry from the outside, you think, oh, they've got everything covered. Everything works smoothly, and then you actually work behind the curtain, and you say, how does any of this ever work? Yeah. But miraculously, I mean, against all odds, perhaps it does. Right? Exactly.
[00:05:45] Unknown:
And so in terms of what you're doing at Coverage Cat now, you mentioned that you're just trying to figure out how much does this insurance cost and what is it actually going to do for me. I'm wondering if you can talk through some of the journey that brought you to that being the problem that you wanted to solve and the experience needed to actually tackle it.
[00:06:03] Unknown:
Yeah. This sounds incredibly naive, but in, in 2016, I was in a job interview for a different job, and, at the end of the job interview, they had pretty much made up their mind that they wanted to hire me. So they were kinda just killing some time and and, like, just, you know, trying to entice me to come work at the company. And the the last interview was, like, if you could go do anything, what would you go do? And I was like, oh, I'd start an insurance company, which I guess is kind of an odd answer for people who are really excited about technology because, you know, it insurance is where a lot of technology ideas go to die. But to me, insurance is kind of the ideal software problem, the ideal data problem. You have this tremendous corpus of information, but you are functionally in the business of selling PDFs. Right? Insurance is a PDF that is a promise, that's a contract, about what it's gonna do for you. But the actual nature of buying insurance is so awful that you have people basically, you know, driving down the highway, seeing the price of gas at different gas stations. And some of the gas stations are selling for $4 a gallon, and some of them effectively are selling at $40 a gallon. And you see people basically turning into the $40 a gallon station. You're like, oh, if you just drove down the road, there'd be a much better option for you, a much better price choice.
But because there's so much opacity in the industry, most people don't realize that. And I was like, why can't we just find the prices of insurance? And in personal lines of insurance, the prices are not like a gas station in that they can't change every day. These are prices that are approved by a regulator, and they're fixed. I think a lot of people come to me with the the the wrong notion about how insurance prices work. They go like, oh, yeah. You know, I negotiated for a great there was no negotiation. The prices were set by regulatory rules, and you could have gone and found the right deal if you just knew all the information.
[00:07:55] Unknown:
For the insurance industry, specifically, you mentioned that there's a lot of opacity. There is a lot of confusion when you're in the process of trying to shop for insurance of what is this actually going to cost me, what am I actually going to get out of it, is any of this worth it, or should I just go stuff all my money under a mattress and hope that I survive long enough to spend it? And I'm wondering if you can talk to some of the different sources of data that you are working with to try and answer these questions and some of the reasons for that opacity in the pricing aspect of insurance.
[00:08:27] Unknown:
Yeah. The first thing is that it is to the insurance incumbent's advantage that there is not price transparency. Right? Price transparency is what many insurance companies fear most. It creates this kind of world where, like, maybe if you're a user of Google Flights, you appreciate that you can go see a lot of flights and a lot of prices and just, you know, oh, I gotta go from Dallas to, San Francisco today. Let's just look at what the options are, and and Google will just show that to you. But and and that was, you know, that's a historical relic of how travel agencies and the, whole airline system built built all of its technology up. But the insurance industry was fortunate enough, fortunate in their mind, to avoid that problem and made it so that it was incredibly challenging for people to go and discover that. You know? And there was actually even a law passed by voters in California to try to create such a thing. It's it's the it's the legal obligation of the California Insurance Department to create an online calculator that shows you insurance prices. But, hilariously, that that calculator is so incomplete that basically if you work all the way through it, you always end up being a 30 year old male driving a Toyota Corolla. And you're like, if you're well, what if I'm not 1 of those things?
You can't you can't go see the prices. So we were like, okay. We know insurance prices are fixed, Well, you know, they change, you know, a couple times a year at most maybe, but per company, per state. And we said, why can't we treat this as a data science and statistical problem in which we go and find millions of prices and then inform people about it without them going through the whole quoting rigmarole and and and data collection. And it turns out, by the way, most of the companies who do that kind of data collection are actually basically in the business of, you know, selling your your name and your phone number to to cold callers, and to junk mail, which, is just it's really disappointing. And it does speaks to kind of why people are are just like you said, you know, throw up their hands at the complexity, and they're like, oh, I I guess I'm just gonna do whatever. I'll just buy whatever.
[00:10:30] Unknown:
For the ways that you're looking to bring transparency to the insurance market, what are the sources of data that you're looking to, ways that you are trying to incorporate it into a unified view, and some of the challenging aspects of being able to get access to that data and collect it reliably.
[00:10:51] Unknown:
The challenges here are pretty immense. So in most states, not all, but in many states, there's a system for filing your rates, and it's called SURF. And in theory, you could just go to SURF and you could just go look up the PDF that the insurance company has filed that describes their pricing algorithm. And it's not like a nice algorithm written in Python that you can just go execute. It's this, like, obfuscated scanned PDF where you're looking up random things in tables, and you're like, oh, what is going on? And that is incredibly challenging. So what we do in in practice is actually we kinda take the opposite view where we're like, we've observed in real life, in the real world, millions of prices. Can we reverse engineer which features, with, you know, which elements of our customers are actually correlating and predicting to large price changes? And then we build models that try to predict that. So this came out of my experience from working in, finance at at a hedge fund where we're like, okay. Well, let's throw up our hands about how the market works, and let's just try to build statistical models that in fact work, that that make money. Well, let's try and do the same thing for insurance. Rather than guaranteeing to the last penny that we're accurate about the prices, let's instead look at the key factors and make sure that we have those attributed for so that we can try and show you prices that get progressively more accurate over time. You mentioned some of the incentives for insurance companies to make it opaque as to how much things actually cost.
[00:12:21] Unknown:
And I'm wondering, given the fact that there are those existing incentives, how do those come into play in your efforts to actually bring that transparency? Are there artificial hurdles that are put in your way of being able to gather this information? Are there contractual elements of the ways that that information is stored and produced that make it difficult for you to be able to use it in the way that you intend? I'm just wondering if you can talk to some of the nontechnical
[00:12:47] Unknown:
challenges that exist purely for the purpose of making this hard. Yeah. I think this is the coverage cat is the is only possible as a start up. And when I say it's only possible as a start up, what I mean is right now, we are just happy to do a thing that cannot make us money for many of our customers. Like, 90, 95 percent of the time, our customers come in and they actually find the best insurance policy for them, and we cannot be paid for it. So and and that's for for a variety of reasons, but, basically, it's because we don't have a a deal, a compensation deal negotiated with the carrier that is actually the best insurance company for them to choose. So no normal insurance company would actually go do what we do because they just wouldn't make money most of the time. Insurance agents predominantly are in the business of, know, selling you a thing that they get paid for. And in fact, that's almost inherent in the in the name.
An agent is a salesperson and they are not your agent. They are an agent of the insurance companies that they're trying to sell. Whereas, Coverage Cat really looks at this totally differently. We're like, it's okay if we don't make money almost all the time because as long as we can make this a software problem rather than a human salesperson problem, then it's possible to do this at scale. And a very small portion of the time where we do make money, that can be enough to to, like, support the whole business. And that's kind of where we came at this from a, like, a philosophical perspective. But, of course, if you go into the industry and talk like this, people think that you're completely bonkers. Like, people are like, you're you're out there chilling products that you don't get paid for. What what are you talking about? And we're like, well, it's just the best thing for our customers. And if we know that, we'll tell them about it. You mentioned that earlier in your career, you were in the, interesting situation of having to deal with
[00:14:33] Unknown:
figuring out which bugs do should I actually care about based on this data that I'm trying to collect. I'm wondering, given your experience now with Coverage Cat and dealing with these artificial hurdles, some of the aspects of your career to this point that have prepared you for this insanity?
[00:14:49] Unknown:
Yeah. There is a general mindset, and I think this has changed in the industry. So I'm I'm perhaps speaking to the beginning of my career rather than where we are today. But, 10 years ago, I think there was a general mindset of statistics are not that important to me as a programmer. I'm not talking about I'm I'm sure someone out there is already writhing in anger about, oh, statistics have always been important to programmers. But, you know, I I joined Microsoft a decade ago, and the way that we measured the importance of a bug was the number of times that it had been observed. Just just a standard hit counter. Oh, let me just increment that. Let me just increment that. Blah blah blah. And we were struggling for a number of reasons, but 1 of the obvious ways that that's insane is you can use stamp sampling. You can use statistics to try to mitigate this this kind of problem so you don't have to collect an exabyte of data a day. And the statistical view, I think, has really taken over, of course. I don't need to sell anyone on it today, but that has totally changed the viewpoint that I think you might have if you enter something like insurance, let's say, where, a much more traditionally oriented person would say, I can't ever show anyone a price until I've gathered every single actuarial detail about this person as a customer and produce the final exact quote.
And that is a view that I think is a disservice to the customer because most of the time, all that data is gonna just get thrown out because you could have told them from the first detail about them that they were never gonna buy that product, that they were even ineligible for that. And that kind of statistical view now permeates a lot of the work that we do with data at Coverage Cat today.
[00:16:30] Unknown:
As far as the actual data specifically that you're dealing with, I'm wondering if you can give some insight as to the format that you that it is available in and some of the characteristics of it, particularly in terms of the 3 v's of, volume, variety, and velocity?
[00:16:48] Unknown:
Yeah. This the data for us is basically comes in in in 2 important flavors. There's all of the personal data about people, our, you know, our our customers, you know, details about, like, your name and and your birth date and all those kinds of things. And then there's all the data that is about the insurance products that they're buying. So that's, you know, coverage limits, deductibles, which riders they're getting attached, you know, all of all of the details that describe the policy functionally, and, of course, the price and and and all those other, elements.
The the third element that gets kind of weird is, I guess, we also collect all of this data and are constantly fighting the schema changes of the carriers themselves that are changing, like, what's going on in the information they require and also how they report back to us these these kinds of differences. So in practice, we basically have this giant pool of raw data. We have a somewhat reasonable ETL pipeline that is pretending pretending that all the carriers in the world are the same and that we can do a better job nicely formatting them to be standardized to run models on.
And then at the end, we have this experimental pipeline in which our models are making predictions, and then we have ground truth in the real world that's validating against the accuracy of those predictions. And we're running a relatively standard data cycle in which gather more data, process more of it, and try and extract the right, structured data details that you can, run the modeling, see if your predictions were close, and then continue. And and the net result is, I think we're basically the only insurance company on the Internet where you can come to us without telling us all of your information, and we'll be give you a rough sense about, carrier pricing and coverage, which which is somewhat unusual.
[00:18:43] Unknown:
As more people start using AI for projects, 2 things are clear. It's a rapidly advancing field, and it's tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI powered apps. Attend the dev and ML talks at NODES 2023, a free online conference on October 26th featuring some of the brightest minds in tech. Check out the agenda and register today at neo4j.com/nodes. That's n e o, the number 4, j.com/nodes.
And so for being able to actually work with this data and engineer around it, can you talk through some of the architecture of your data platform and the ways that you thought about the prioritization of what am I trying to optimize for, what do I need to be able to deliver reliably, and what are the pieces that we can actually just put some duct tape on and muddle through for now? Well, as a start up, what are the things that you put the duct tape on and muddle through? And the answer is basically
[00:19:56] Unknown:
everything that you can possibly get away from except for, like, the places where the volume is too large. So, the moment I think you cross over, millions or tens of millions or 100 of millions of rows, or elements of data, you're just immediately in the land where non programming approaches don't work. Right? It's it's completely non amenable to the human view of the world. And normal insurance agencies focus on having a single human, a single salesperson who works through a single client set of data and then produces, like, a very small n number of results. But you just really can't do that, and none of the existing tooling works in the industry to support you if you want to go do it the way that we do it. So we have effectively found ourselves in the problem of building a lot of things that are unique in the insurance industry, but would be common in in software, I think, and and in finance, definitely.
I does that fully kind of answer what you were getting at there? Yeah. And also because of the fact that you're dealing with this
[00:21:05] Unknown:
somewhat bespoke area of, of the industry and specific types of data that pertain to it, how much of the data platform tooling have you been able to take off the shelf, and what are the pieces that you had to build from scratch because it was a a custom requirement?
[00:21:22] Unknown:
I think the the only thing that was extremely custom is interacting with insurance carriers, and that's just if there's, like, a couple major vendors there that have really wormed their way in. And, if I have to go read more XML definitions of how to interface with particular insurance vendors. Again, I my my eyeballs will bleed out. But, fortunately, once you're past that point, once the data is collected by the way, I think everyone in the data engineering world loves to spend a bunch of time talking about the beautiful things that happen once you have the data, but, of course, there's just so many ugly and gnarly things leading up to that that produce serious problems in how you actually analyze the data. I've been blessed that almost all of my work with, like, actual hands on keyboard tippity tapping away, has been on my employers have done a great job on all the preprocessing, and I've had the good fortune to just have the relatively standard prediction problem of, okay. Here's millions of rows of slop.
Please go convert these millions of rows to slop into something that we can actually run as a model that sits on our website as a live predictor. And that is actually relatively simple. I think that that is basically a world in which I'm comfortable, you know, we're happy handling it right now all in, Python Pandas and treating the problem as a relatively standard, like, out sample prediction task where your goal is to minimize error. And, you know, there's some complexity in the thought of minimization of error. It's not simply MSE necessarily, but that's that's what happens, like, on the back half.
[00:22:52] Unknown:
And as you were going through the design process and technology selection, what are the things that influenced your decisions? Was it largely based on, well, this is the language that we're using for building the application, so we wanna bias towards that for our data stack? Was it we just wanna be able to get the cheapest thing that does the thing that we need right now? I'm just wondering if you can talk through that kind of build versus buy aspect, and what were the things that biased you towards the tooling that you ended up settling on? Oh, and we're wandering into controversy already, but I'm gonna say,
[00:23:25] Unknown:
a rude thing here, which is everyone thinks that your problems are not that unique, so you ought to be able to buy software that all solves this problem. And you certainly, as a start up, cannot write all the software you're gonna need, so you do have to buy a lot of software. But it just turns out almost all the time that you buy software, it's kind of junk. Just so that that's my opinion on buying software. And you get this really lovely sales pitch about some, you know, cutting edge b 2 b SaaS pitch that's like, oh, yeah. It's just gonna solve everything for for everything you need. And then you go and open the box, and it's it's like it's it's the a scene from Arrested Development dead dove inside. And you open it, and you're like, well, that's kind of on me now. So I think the the view is basically we try and use as much that is standard, that is quality, that is open source across the stack. And then the moment that you run into the, like, the And as
[00:24:29] Unknown:
And as you have gone from the initial idea of, is this even possible? Let me just throw something together real quick that acts as a proof of concept to where you are now, where you have a business that relies on the technology stack that you're running on. How have the overall design and goals of the system changed in that process?
[00:24:47] Unknown:
Yeah. I think the gap between I have a proof of concept and I have a production level product, is so large that most people who are let's say let's say most people in management at software companies or non software companies can't even imagine it. I don't know how many times, you or I have seen, like, a hackathon project where someone threw something together in, like, 16 hours, and it looks great. And then, there's some managerial review, and somebody's like, great. Publish it or whatever. And and the person who's, like, done all this work is just ash faced because they know the gulf between these 2 things is just just months months. And and our story is no different, there.
I can actually remember 1 of the problems we immediately had was our models started just dying in when people land on our website, like, real code is executing to try and calculate these these things. And we had, set up our our our system with certain expectations about the number of workers and the number of threads they would need. And it just we effectively nuked or DDoSed ourself into oblivion at the beginning here. And, you know, what worked great on my machine running an offline model and calculating a couple of things when I, you know, went to sleep, came back in the morning was was really not viable at scale.
[00:26:07] Unknown:
Absolutely. Yeah. 1 1 of my, favorite tropes is, any software that is insufficiently broken will remain in production.
[00:26:16] Unknown:
Yeah. Absolutely. Like, in in the trades, I believe it's, there there you know, there's no such thing as a short term fix. Right? Like, if it's a short term fix and it fixes it, well, it's a long term fix. Exactly. Absolutely.
[00:26:31] Unknown:
You've got a system. It's running in production right now. Obviously, there are sharp edges and skeletons in the closet, and I'm wondering what your approach has been to prioritizing or identifying which pieces of tech debt do I care about, which pieces do I need to care about, and what are the pieces that I'm just going to print pretend that don't exist.
[00:26:52] Unknown:
Yeah. I think all early stage start ups are triaging in the same way in what I imagine the same way that, like, an emergency room is. If you've ever been to the emergency room 1 of the things that I love about going to the emergency room is it is a reminder of how inconsequential your health problems are because you'll go in there and they'll be like, you're feeling quite poorly. And they'll be like, oh, it's a 2 hour wait. And you you're like, 2 hours? I don't that seems like a very long time. And then you see someone walking with a gunshot wound. You see someone walking with, like, a knife sticking out of their back, and you're like, oh, yeah. 2 hours seems pretty reasonable. So so it's a great reminder on the perspective in life that you can have, when when you're having a a pretty bad day usually.
But I think it would take a pretty similar approach. It is clear there are enormous parts of the insurance industry that are basically so reluctant to embrace automation for a variety of reasons, some of which are, like, political, some of which are strategic, some of which are technical. Although frankly, the technical problems are the the slightest ones that we're like, okay. Well, until whatever. Max's day no longer can support the number of clients that we have that are asking us questions about this manual process, we're gonna keep doing things manually. And the overall design of our, company is basically the people whose job it is is to write code are fully async. They can manage their life. They can do they can work the way that they need to work to get the product out. And then there's a portion of the company, of which I am probably part of that portion, which is basically like, I can be interrupted at a moment's notice for whatever crisis is going on. Oh, there's a hurricane bearing down in Florida, so we can't write new policies for 72 hours and like, that that's that's just my day. Right?
And all of those things get dumped into the exception bucket, and the exception bucket in insurance companies is people. It's just, it's people like me.
[00:28:56] Unknown:
And as you have gotten further into this space of the insurance industry, what are some of the hurdles or barriers that you've run into that you couldn't see from the start of the marathon and some of the ways that that have thrown a wrench in your projected road map of, oh, this is what we think we're going to deliver this quarter, and, oh, shoot. That's actually never going to happen because of this thing that is completely bonkers.
[00:29:24] Unknown:
I'm gonna tell you this story, Tobias, that it's truly mortifying if you come from this as, like, a a financial analyst manager. When we originally wrote our price prediction models, 1 of the key assumptions of the model is that you always pay more money for more insurance. And that that seems like a reasonable assumption. Right? Like, as your deductibles get lower or as your coverages increase, like the amount that the insurance company will pay out to you, we said, just we told the model, assume the price is gonna go up in that case. Like, please do never produce negative prices for these kinds of things. And we spent like a week debugging why our accuracy kept getting worse in this 1 particular case where we're like, okay. Why does the model keep insisting that prices should be going down as you're buying more insurance? And so I finally logged on to that particular insurance carrier's website, and I started just, like, manually clicking on the buttons and buying more insurance and actually saw my price going down. I was like, what on earth? Something has gone enormously wrong as far as I can tell in their actuarial model. But the real prices of insurance are not at all what you would expect.
And assumptions that seemed reasonable in the are just completely invalidated. And and at that point, we're just like, okay. Fine. I'll throw up my hands. I don't know what's going on here.
[00:30:45] Unknown:
This episode is brought to you by DataFold, a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. DataFold leverages data diffing to compare production and development environments and column level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. DataFold integrates with DBT, the modern data stack, and seamlessly If you are migrating to a modern data stack, DataFold can also help you automate data and code validation to speed up the migration. Learn more about DataFold by visiting data engineering podcast.com/datafold today.
Another interesting aspect of insurance in particular is that you can break it down to this numerical model of if I give you this much money per month, you will give me this much money in the event of some claim, but there's also a lot of fine print and detail and worked in exception clauses. And there's never a, apples to apples comparison between, insurance providers or even insurance plans, and I'm wondering how you try to tackle some of those aspects of, okay, we can give you a price for this type of insurance, but even though this insurance plan costs the same as this other insurance plan, this one's actually better, or this 1 is maybe a 5% price increase.
But in the actuality, you know, there are aspects of it that make it far outweigh the cheaper plan in terms of what you're all actually be getting out of it. I'm just curious how you've tried to approach that question of, the detailed elements of the policy breakdown for helping people understand which plan they actually want to buy for their life.
[00:32:33] Unknown:
This is true in the extremes, but actually I'm gonna myth bust a little bit here. Insurance for individuals for, like, for personal lines in the United States is a relatively standardized industry, and that's not to say that there are not. Lots of important details. There are lots of there are like 6 or 7 pretty key variables in, let's say, a bog standard auto insurance policy in the US that you really do need to know about. But after that point, it's such a commodity product for most people that that isn't true. And 1 of the things that I feel like the insurance industry likes to continue to propagate as a myth is the insistence that things are gonna be wildly different 1 way or another. You certainly see that in all the advertising that is put out there. And and, of course, that's that's reasonable. Right? Their their mission is to persuade you that it's not a commodity product, that there's significant differentiation.
But just just to draw an example from real life here, when a major disaster happens, you might think, oh, major disaster happened. Good thing I have carrier x who's fabulous. All their adjusters are super well trained. It's gonna work out great for me. There is no way that company x has enough adjusters in your area to actually go handle this crisis. So what do they do? Everyone draws from the shared pool of the traveling adjusters who are who are, you know, spread across all the companies to go and handle these kinds of things, and the net result is the policies are highly similar. And in major catastrophes, the treatment that you're gonna get is is, in my opinion, relatively similar because those the the key people, these adjusters, are are getting drawn from a commoditized pool. And so I guess I would say it's just not true. Policy details are very important, but it is a relatively commodity product, which makes it amenable to this kind of analysis.
[00:34:23] Unknown:
And given the fact that to your point of the the personal insurance is is fairly commoditized and standardized, what are the types of insurance that have been most conducive to this approach to problem solving, and how have you been thinking through the process of bringing your capabilities to a broader variety of insurance types or things that you might want to insure and just some of the elements of the insurance industry that are especially, resistant to this statistical approach to being able to actually manage the, information gathering and information analysis of who who should buy what.
[00:35:06] Unknown:
There was an article earlier this year in The Wall Street Journal that really went against the grain of the narrative that I think a lot of venture capitalists are interested in right now, which is about AI and automation. This article was about the insurance industry, and I I think the headline was the humans have won. And that's basically true for most parts of the insurance industry. For about 20 years, people have been promising automation to just place the tens of thousands of human salespeople who run insurance agencies. And all of those efforts I'm not sure if I should put an asterisk and say, like, maybe almost all of it, but almost all of those efforts have failed. I'm not saying that there's no automation, but it turns out the thing that sells bespoke insurance products are human salespeople on a phone.
But our view is that there are 1, 000, 000, 000 of dollars, about $500, 000, 000, 000 a year in personal lines of insurance that are not really bespoke products. The insurance industry is massive, 1, 000, 000, 000, 000 of dollars, but most of the industry, and actually a lot of the profit, is concentrated in these kinds of bespoke things. You hear these stories of, like, you know, so and so hand model ensures their hands on a Lloyds marketplace. That's never I'm not gonna say never. That's not ending up on, an automated stack of coverage cat anytime soon. I'm not there's no there's no model I can write about those kinds of things because they are extremely unusual. They're just contracts.
But a lot of the industry is a pretty commodity product. You know, in the US, $500, 000, 000, 000 a year of pretty commodity product from personal lines, and that's where we think we can crack the nut. It's not in the areas like, oh, how do we handle a, you know, construction risk for new high rise where they're using, certain kinds of safety gear? That's the kind of complexity that is is really out of our our scope.
[00:37:06] Unknown:
To that point of AI, I'm wondering if there has been any utility from the rampant growth of large language models and being able to apply them to questions of summarization for insurance policies to be able to help with that data extraction and data aggregation piece for those more detailed elements that we were discussing?
[00:37:28] Unknown:
OpenAI wrote a report that basically pinned insurance as, like, the 3rd most likely candidate for automation. 3rd, 3rd, 5th. Some some 1 of the most likely candidates of of, LLMs. I haven't seen it yet. That doesn't mean that someone's not I'm sure I'm sure there are many people working on it, but the problems, if you've ever dealt with LLMs, is 1 of the largest problems. Let's just let's just put put there in order. The the largest and most important problem in insurance is selling insurance, And LLMs are not actually very persuasive salespeople. If you've ever had an LLM, it's never really enticed you to buy. Kinda just came off as an idiot. Right?
So that's problem number 1. Problem number 2 in kind of, like, the support and claims automation. That's actually seems much more reasonable to me, but you're gonna run into the problem there of claims actually aren't that frequent. Right? Like, you sell a lot more policies than have claims, hopefully, and so you are competing against cheap humans. And anytime you're competing in cheap against cheap humans, well, some of the time, the cheap humans win. Right? They're just cheaper than the LOM, and and if they can do the job better, of course. So we'll see. I'm sure progress will be made here. I am not sure that it's gonna come at the pace or at the market size that some people are anticipating.
[00:38:49] Unknown:
Now digging into the failure modes of your work and the ways that things can go wrong given that you are selling insurance to people that they don't care about until they really care about it. I'm wondering if you can talk through some of the ways that you think about the error conditions, failure modes, data quality aspects, and just the, we really need to make sure that this thing really works well piece.
[00:39:14] Unknown:
At the end of the line, when you go and buy with Coverage Cat, there's actually a step in which I actually review every recommendation we make, and I click approve. So right now, like, there's a there's a human at the end that is me that tries to that that makes sure of that. And and there's also reasons for that. Like, you know, we carry insurance to make sure that we don't make a professional liability error. Like, not not only was this the right thing to do for our customers, but this is, you know, somewhat necessary. So so that's how we do it today.
I would actually say we do better than most human aid like, we review policies every day from people who send us what some other human sold them. And those policies are often egregiously flawed because you're in a hurry to buy insurance. You don't really want to understand it. You don't know, and the agent has an incentive to just sell you more. So, like, I see policies where people have half a $1, 000, 000 of personal property in their home insured, and many people don't really have a half $1, 000, 000 of stuff in their house. It's kinda it's kinda hard to have a half $1, 000, 000 of stuff, especially if it's actual cash value, not replacement value. Well, you know, your your $100 bed frame from IKEA in actual cash value, how much is a used IKEA bed frame really going for these days? And so you see these kinds of problems everywhere. In general, if I had to just pick if if we're talking separate from the data engineering crowd, I'm just gonna impart some pearls of insurance wisdom. There are 2 mistakes that I see everywhere in insurance.
When you are poorer, you need less, third party insurance because it's very hard for people to sue you to get 1, 000, 000 of dollars when you have nothing, but you need more first party insurance because you can be bankrupted because the, you know, your your car is hard to replace. As you get richer, please stop buying insurance for tiny little things. Just self insure and, you know, you've your bank account is your insurance net, and that really avoids all the claims hassle. That avoids having claims on your records. You won't see the price hikes from it, blah blah blah. But the thing that you need to worry about is your 3rd party liability.
Suddenly, you're walking around with a target on your back where everyone knows if something bad happens, they can sue you for all that you're worth. And you always wanna have enough insurance that they would rather settle with your insurer than try and, you know, go to a trial court and and and take all your savings. So that's basically what the coverage cat optimizer does in terms of picking the right insurance limits. Those are the that's that's what we specialize in. And in fact, almost all of our clients should be buying less insurance at the low end and more insurance at the high end of, like, what kind of catastrophe is covered.
[00:42:07] Unknown:
As you've been working in this space of working with this data, working with the elements of how is this going to actually have a real world impact, What are some of the most interesting or innovative or unexpected ways that you have seen the data that you're aggregating used, whether it's decision making, whether it's further analysis? Just interested in some of the the the unexpected outcomes of your work.
[00:42:30] Unknown:
I'm trying to think if there was a real unexpected outcome. I think probably the largest unexpected outcome I've seen is many of our clients file insurance claims that they shouldn't have, and they file those because no one's in their corner advising them. Like, just just to pick on a specific example here, like, if you go ask your insurance agent, the person who sold you a policy who represents the carrier, oh, should I file this claim? They're not really gonna give you a sensible answer on this topic. And I've seen clients who have filed, you know, a $200 chip in their windshield claim that has produced literally 1, 000 of dollars over the years of increased costs of insurance for them.
And any reasonable person who knew how prices change when you file claims would have been like, don't do that. Just just replace your windshield yourself. And so we ended up actually building a claims calculator to enable our clients to kind of get a rough sense about how their prices will change if they filed that claim and decide if it's worth it or not. There's so many rules of thumb that people tell you in the insurance industry that are just not grounded in data. And our goal is to try to be that actually data rich provider of those kinds of answers to to to to myth bust, to end the kind of, guesswork.
[00:43:58] Unknown:
And in your experience of building this company, working through these datasets, figuring out how to actually make some useful insight from this morass of, opacity, what are the most interesting or unexpected or challenging lessons that you learned in the process?
[00:44:15] Unknown:
I think 1 of the key advantages of startups is that you can seriously underestimate how difficult the task you're trying to go do is. And when I entered into when we when we started this business, I was like, well, it's gonna be pretty straightforward. Like, you know, we we have written financial models. We know how to do some data gathering. This is not this is not gonna work out. It just turns out there's just so much crap buried there. And if you aren't an incumbent, if you've, like, lived that life, you know that's true. I mean, even I, from from my background, I knew that was gonna be true, but I was willing to put on my horse blinders and pretend that it was gonna be simple and pretend that we were good. I was like, oh, yeah. In a year, we're gonna have a really accurate price model from, you know, millions of good thing. No. It's just, like, it's harder than that. And if you wanna go create your own company or create your own start up, I think maybe there's a lucky few for whom it's just a very straightforward process.
But for us at least and for, I think, most people, you have to pretend it's going to be easier than it actually is because if you knew how hard it would be, you might not step out the door.
[00:45:23] Unknown:
Given your experience of working in this space, taking a very data driven approach to the insurance industry, what are the cases where a purely statistical view of the market is the wrong approach and you actually need to be more detail oriented? And what what are the aspects of the industry that are resistant to scale?
[00:45:41] Unknown:
At the end of the day, you are ultimately producing, as we just talked about, a PDF, And that PDF must perfectly accurately represent the information of your client. You cannot take a statistical view of the number of children your client has. You have have to actually answer things. So, there is this friction in which you transcend from here is very roughly here's the blurry sketch and then the last bit of, like, translate all of the blurry sketch into the final bound policy in PDF, that remains
[00:46:14] Unknown:
not amenable to statistics. That's that's just a that's just a data collection piece in in our view. And as you continue to invest in this area and build coverage cap, what are some some of the things that you have planned for the future of your data stack, your product suite, and some of the thorny problems that you're excited to dig into?
[00:46:35] Unknown:
1 of the favorite things of mine in insurance is how you can mix and match and combine insurance. And I think, you know, bless the people out there that are running 1, 000, 000, 000 of dollars of ads saying bundle and save. But in many cases, bundle and save is a misnomer because, sure, you save $50 for bundling, but the actual cost if you had bought this a la carte would have been 1, 000 of dollars different. And you can actually combine policies in interesting ways to produce 1, 000, 000 of dollars in coverage at much lower prices. And so the thing that I am really excited about is Coverage Cat continues to go forward on the customers who have complicated insurance needs. How can we take a data driven approach to actually optimizing that? And that's a much more sophisticated optimization problem. That's a question of, oh, it's not merely moving the slider on the Allstate auto bar. That's a question of which carriers can comprise the appropriate set of coverage for an individual's risk tolerance and for their net worth.
And we can see tremendous savings there. I see clients who spend tens of 1, 000 of dollars a year in insurance, most of which is unnecessary or could could be produced a lot of that risk mitigation could be produced in a different way. And that's where I'm really, that's that's where the the big opportunity for our clients is.
[00:47:59] Unknown:
And are there any aspects of the work that you're doing at Coverage Cat, the overall space of the insurance market and the available data in ways that it can be used to drive insight and purchasing decisions or any other aspects of this problem space that we didn't discuss yet that you would like to cover before we close out the show?
[00:48:18] Unknown:
Oh, yeah. I think our ultimate goal is really simple. Today, when you buy insurance, and I talk to people about this every day, most people I know open up their web browser, Google, blah blah blah, you know, whatever, auto insurance in New York. And, they click on the first link. That link tells them a bunch of information that is reasonable but not terribly useful. And if they're lucky, at some point in this process, they end up with buying a sane policy. They usually pick the details of the policy themselves. They don't choose the right things. They don't know. I the eventual goal is just, like, Google should do a better job here. Coverage Cat is the right you know, like, in my mind, it's the right way to buy it and that's that we will tell you about the things you should be buying, and we will show you the prices from the many options, and we will optimize for you.
[00:49:10] Unknown:
I hope, you know, 20, 30 years down the line, what we're doing is the way that people actually buy insurance, and that there is not an army of 60, 000 salespeople beating down your door to sell you stuff you don't need. Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:49:38] Unknown:
I think a lot of people have really great ideas about how data pipelines ought to work, but I've seen very little that shows me that we are really actually making it accessible and stable and manageable. It's like it's really like we're building on a foundation of sand in many of these places, and the most ambitious ideas to go change that have either not succeeded or certainly not percolated into the mainstream. Like, when you think about how companies really do data management today, it's probably either a SQL variant or spreadsheets. And that tells me that there remains enormous opportunity for someone to actually create a product that is usable by everyone, that is, enables you to know about how the data you're managing is is working.
You know, I think Microsoft Access didn't succeed, but it had the right idea, and I hope some someone comes up with something like that. And and it's clearly not, you know, these consultancy oriented super complicated, data management providers. Absolutely.
[00:50:48] Unknown:
Well, thank you very much for taking the time today to share your journey into this interesting and perilous space of insurance and being able to make it less opaque. Appreciate the time and energy you're putting into making that a tractable problem for the average person. So, thank you again for that, and I hope you enjoy the rest of your day.
[00:51:09] Unknown:
Alright. Thanks, sir. Bye. You take care.
[00:51:17] Unknown:
Thank you for listening. Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its its community, and the innovative ways it is being used, and the Machine Learning Podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com. Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
[00:51:56] Unknown:
Workers.
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable enriched data to every downstream team. You specify the customer traits, then profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack. You shouldn't have to throw away the database to build with fast changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades old batch computation model for an efficient incremental engine to get complex queries that are always up to date.
With Materialise, you can. It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real time dashboarding and analytics, personalization and segmentation, or automation and alerting, Materialise gives you the ability to work with fresh, correct, and scalable results, all in a familiar SQL interface. Go to data engineering podcast dotcom/materialize today to get 2 weeks free. Your host is Tobias Macy. And today, I'm interviewing Max Cho about the wild world of insurance companies and the challenges of collecting quality data for this opaque industry and Max's journey through his career to get to where he is today. So, Max, can you start by introducing yourself? Hey, Tobias. Great to be here. I am the CEO of Coverage Cat, a startup that is focused on enabling people to get the right insurance by actually just knowing the price of insurance, which sound really dumb and really simple, but in fact, it's a enormous data challenge. And do you remember how you first got started working in data?
[00:01:56] Unknown:
Well, okay. So somewhat longer story, but, my first job out of college was at Microsoft, and it was working on what are the worst bugs in in Microsoft software, which sounds like it ought to be straightforward, but actually is awful as well. We built this just enormous catalog of Microsoft bugs, but the catalog was far too large for anyone to grapple with. So you just immediately run with the data question of, like, some VP asks you, okay. Which of these bugs do I need to go I mean, let let's let's just choose an extreme example. If you're running Internet Explorer, you have, you know, a 100 times more bugs than you have software engineers.
So you immediately run to the question of which of these bugs is worth fixing. And that's both a a psychological question in that some bugs make users much more angry than other bugs. And it's a, you know, it's a question of, like, how do I possibly keep track of all this? How do I even know which are the which are the real problems? And that was the first thing that I got into, and we got into that in the era in which all of the data was being stored on Microsoft SQL Server. And, let me assure you, Microsoft SQL Server has some opinions about how many millions of rows of data you should put in it for it to work well.
[00:03:11] Unknown:
I can imagine, especially at earlier in its history. I'm sure at this point, they've kind of lifted some of those opinions or expanded them a bit. But
[00:03:20] Unknown:
It was in the time that I was at Microsoft that we final finally transitioned from, like, a human who would log in to a server and, like, restart the database if it crashed. So that that was the era that I entered my career, and today, I'm sure nobody does. Well, I don't know. Maybe someone at Microsoft.
[00:03:40] Unknown:
You intern over there. Go reboot the server.
[00:03:45] Unknown:
The, the, when when when our, when when our engineers were on vacation, there would be no 1 who would go log in to the servers and delete the log files that would accumulate, so the servers would occasionally run out of hard drive space. So, I definitely got sent an email 1 morning saying, hey, Max. You need to go log in to these servers and delete some crap just so that we have room to keep operating. And we're like, oh, yeah. See, look. It's not just random startups. It's also Microsoft doing this kind of thing.
[00:04:15] Unknown:
Absolutely. Yeah. You mean you mean everything's not serverless now? They don't have disks. They just have the cloud. It's infinite.
[00:04:23] Unknown:
Well, I was reminded of the so you you might recall when, Twitter fired, whatever, laid off 80% of its people. So that there were, you know, bets internally about, like, oh, what's gonna go on? What's gonna break first? And I thought the most reasonable bet was, oh, some server somewhere is going to run out of disk space and no one's like, not all of this is instrumented. Not all of it's automated. And, I don't know that we'll ever find out. We'll never get the postmortem, on any of this, but it seems likely that's probably happened once or twice already. Absolutely. Or some,
[00:04:54] Unknown:
runaway process that has a known memory leak decided to consume all of the available RAM, and the server's just sitting in deadlock somewhere.
[00:05:02] Unknown:
Mhmm. And, I think people are like, oh, you know, you if you were building this right, it would never happen. But, of course, when you actually work at a real company, nothing is built. So, like, the the chief advantage of companies with humans is that you can have the humans patchwork all of this nonsense and then be the thing that is the fail safe when it breaks. Yeah. And there's always some other fire that's more important. You never go full automation in in any company I've worked in. Absolutely. Yeah. With any industry from the outside, you think, oh, they've got everything covered. Everything works smoothly, and then you actually work behind the curtain, and you say, how does any of this ever work? Yeah. But miraculously, I mean, against all odds, perhaps it does. Right? Exactly.
[00:05:45] Unknown:
And so in terms of what you're doing at Coverage Cat now, you mentioned that you're just trying to figure out how much does this insurance cost and what is it actually going to do for me. I'm wondering if you can talk through some of the journey that brought you to that being the problem that you wanted to solve and the experience needed to actually tackle it.
[00:06:03] Unknown:
Yeah. This sounds incredibly naive, but in, in 2016, I was in a job interview for a different job, and, at the end of the job interview, they had pretty much made up their mind that they wanted to hire me. So they were kinda just killing some time and and, like, just, you know, trying to entice me to come work at the company. And the the last interview was, like, if you could go do anything, what would you go do? And I was like, oh, I'd start an insurance company, which I guess is kind of an odd answer for people who are really excited about technology because, you know, it insurance is where a lot of technology ideas go to die. But to me, insurance is kind of the ideal software problem, the ideal data problem. You have this tremendous corpus of information, but you are functionally in the business of selling PDFs. Right? Insurance is a PDF that is a promise, that's a contract, about what it's gonna do for you. But the actual nature of buying insurance is so awful that you have people basically, you know, driving down the highway, seeing the price of gas at different gas stations. And some of the gas stations are selling for $4 a gallon, and some of them effectively are selling at $40 a gallon. And you see people basically turning into the $40 a gallon station. You're like, oh, if you just drove down the road, there'd be a much better option for you, a much better price choice.
But because there's so much opacity in the industry, most people don't realize that. And I was like, why can't we just find the prices of insurance? And in personal lines of insurance, the prices are not like a gas station in that they can't change every day. These are prices that are approved by a regulator, and they're fixed. I think a lot of people come to me with the the the wrong notion about how insurance prices work. They go like, oh, yeah. You know, I negotiated for a great there was no negotiation. The prices were set by regulatory rules, and you could have gone and found the right deal if you just knew all the information.
[00:07:55] Unknown:
For the insurance industry, specifically, you mentioned that there's a lot of opacity. There is a lot of confusion when you're in the process of trying to shop for insurance of what is this actually going to cost me, what am I actually going to get out of it, is any of this worth it, or should I just go stuff all my money under a mattress and hope that I survive long enough to spend it? And I'm wondering if you can talk to some of the different sources of data that you are working with to try and answer these questions and some of the reasons for that opacity in the pricing aspect of insurance.
[00:08:27] Unknown:
Yeah. The first thing is that it is to the insurance incumbent's advantage that there is not price transparency. Right? Price transparency is what many insurance companies fear most. It creates this kind of world where, like, maybe if you're a user of Google Flights, you appreciate that you can go see a lot of flights and a lot of prices and just, you know, oh, I gotta go from Dallas to, San Francisco today. Let's just look at what the options are, and and Google will just show that to you. But and and that was, you know, that's a historical relic of how travel agencies and the, whole airline system built built all of its technology up. But the insurance industry was fortunate enough, fortunate in their mind, to avoid that problem and made it so that it was incredibly challenging for people to go and discover that. You know? And there was actually even a law passed by voters in California to try to create such a thing. It's it's the it's the legal obligation of the California Insurance Department to create an online calculator that shows you insurance prices. But, hilariously, that that calculator is so incomplete that basically if you work all the way through it, you always end up being a 30 year old male driving a Toyota Corolla. And you're like, if you're well, what if I'm not 1 of those things?
You can't you can't go see the prices. So we were like, okay. We know insurance prices are fixed, Well, you know, they change, you know, a couple times a year at most maybe, but per company, per state. And we said, why can't we treat this as a data science and statistical problem in which we go and find millions of prices and then inform people about it without them going through the whole quoting rigmarole and and and data collection. And it turns out, by the way, most of the companies who do that kind of data collection are actually basically in the business of, you know, selling your your name and your phone number to to cold callers, and to junk mail, which, is just it's really disappointing. And it does speaks to kind of why people are are just like you said, you know, throw up their hands at the complexity, and they're like, oh, I I guess I'm just gonna do whatever. I'll just buy whatever.
[00:10:30] Unknown:
For the ways that you're looking to bring transparency to the insurance market, what are the sources of data that you're looking to, ways that you are trying to incorporate it into a unified view, and some of the challenging aspects of being able to get access to that data and collect it reliably.
[00:10:51] Unknown:
The challenges here are pretty immense. So in most states, not all, but in many states, there's a system for filing your rates, and it's called SURF. And in theory, you could just go to SURF and you could just go look up the PDF that the insurance company has filed that describes their pricing algorithm. And it's not like a nice algorithm written in Python that you can just go execute. It's this, like, obfuscated scanned PDF where you're looking up random things in tables, and you're like, oh, what is going on? And that is incredibly challenging. So what we do in in practice is actually we kinda take the opposite view where we're like, we've observed in real life, in the real world, millions of prices. Can we reverse engineer which features, with, you know, which elements of our customers are actually correlating and predicting to large price changes? And then we build models that try to predict that. So this came out of my experience from working in, finance at at a hedge fund where we're like, okay. Well, let's throw up our hands about how the market works, and let's just try to build statistical models that in fact work, that that make money. Well, let's try and do the same thing for insurance. Rather than guaranteeing to the last penny that we're accurate about the prices, let's instead look at the key factors and make sure that we have those attributed for so that we can try and show you prices that get progressively more accurate over time. You mentioned some of the incentives for insurance companies to make it opaque as to how much things actually cost.
[00:12:21] Unknown:
And I'm wondering, given the fact that there are those existing incentives, how do those come into play in your efforts to actually bring that transparency? Are there artificial hurdles that are put in your way of being able to gather this information? Are there contractual elements of the ways that that information is stored and produced that make it difficult for you to be able to use it in the way that you intend? I'm just wondering if you can talk to some of the nontechnical
[00:12:47] Unknown:
challenges that exist purely for the purpose of making this hard. Yeah. I think this is the coverage cat is the is only possible as a start up. And when I say it's only possible as a start up, what I mean is right now, we are just happy to do a thing that cannot make us money for many of our customers. Like, 90, 95 percent of the time, our customers come in and they actually find the best insurance policy for them, and we cannot be paid for it. So and and that's for for a variety of reasons, but, basically, it's because we don't have a a deal, a compensation deal negotiated with the carrier that is actually the best insurance company for them to choose. So no normal insurance company would actually go do what we do because they just wouldn't make money most of the time. Insurance agents predominantly are in the business of, know, selling you a thing that they get paid for. And in fact, that's almost inherent in the in the name.
An agent is a salesperson and they are not your agent. They are an agent of the insurance companies that they're trying to sell. Whereas, Coverage Cat really looks at this totally differently. We're like, it's okay if we don't make money almost all the time because as long as we can make this a software problem rather than a human salesperson problem, then it's possible to do this at scale. And a very small portion of the time where we do make money, that can be enough to to, like, support the whole business. And that's kind of where we came at this from a, like, a philosophical perspective. But, of course, if you go into the industry and talk like this, people think that you're completely bonkers. Like, people are like, you're you're out there chilling products that you don't get paid for. What what are you talking about? And we're like, well, it's just the best thing for our customers. And if we know that, we'll tell them about it. You mentioned that earlier in your career, you were in the, interesting situation of having to deal with
[00:14:33] Unknown:
figuring out which bugs do should I actually care about based on this data that I'm trying to collect. I'm wondering, given your experience now with Coverage Cat and dealing with these artificial hurdles, some of the aspects of your career to this point that have prepared you for this insanity?
[00:14:49] Unknown:
Yeah. There is a general mindset, and I think this has changed in the industry. So I'm I'm perhaps speaking to the beginning of my career rather than where we are today. But, 10 years ago, I think there was a general mindset of statistics are not that important to me as a programmer. I'm not talking about I'm I'm sure someone out there is already writhing in anger about, oh, statistics have always been important to programmers. But, you know, I I joined Microsoft a decade ago, and the way that we measured the importance of a bug was the number of times that it had been observed. Just just a standard hit counter. Oh, let me just increment that. Let me just increment that. Blah blah blah. And we were struggling for a number of reasons, but 1 of the obvious ways that that's insane is you can use stamp sampling. You can use statistics to try to mitigate this this kind of problem so you don't have to collect an exabyte of data a day. And the statistical view, I think, has really taken over, of course. I don't need to sell anyone on it today, but that has totally changed the viewpoint that I think you might have if you enter something like insurance, let's say, where, a much more traditionally oriented person would say, I can't ever show anyone a price until I've gathered every single actuarial detail about this person as a customer and produce the final exact quote.
And that is a view that I think is a disservice to the customer because most of the time, all that data is gonna just get thrown out because you could have told them from the first detail about them that they were never gonna buy that product, that they were even ineligible for that. And that kind of statistical view now permeates a lot of the work that we do with data at Coverage Cat today.
[00:16:30] Unknown:
As far as the actual data specifically that you're dealing with, I'm wondering if you can give some insight as to the format that you that it is available in and some of the characteristics of it, particularly in terms of the 3 v's of, volume, variety, and velocity?
[00:16:48] Unknown:
Yeah. This the data for us is basically comes in in in 2 important flavors. There's all of the personal data about people, our, you know, our our customers, you know, details about, like, your name and and your birth date and all those kinds of things. And then there's all the data that is about the insurance products that they're buying. So that's, you know, coverage limits, deductibles, which riders they're getting attached, you know, all of all of the details that describe the policy functionally, and, of course, the price and and and all those other, elements.
The the third element that gets kind of weird is, I guess, we also collect all of this data and are constantly fighting the schema changes of the carriers themselves that are changing, like, what's going on in the information they require and also how they report back to us these these kinds of differences. So in practice, we basically have this giant pool of raw data. We have a somewhat reasonable ETL pipeline that is pretending pretending that all the carriers in the world are the same and that we can do a better job nicely formatting them to be standardized to run models on.
And then at the end, we have this experimental pipeline in which our models are making predictions, and then we have ground truth in the real world that's validating against the accuracy of those predictions. And we're running a relatively standard data cycle in which gather more data, process more of it, and try and extract the right, structured data details that you can, run the modeling, see if your predictions were close, and then continue. And and the net result is, I think we're basically the only insurance company on the Internet where you can come to us without telling us all of your information, and we'll be give you a rough sense about, carrier pricing and coverage, which which is somewhat unusual.
[00:18:43] Unknown:
As more people start using AI for projects, 2 things are clear. It's a rapidly advancing field, and it's tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI powered apps. Attend the dev and ML talks at NODES 2023, a free online conference on October 26th featuring some of the brightest minds in tech. Check out the agenda and register today at neo4j.com/nodes. That's n e o, the number 4, j.com/nodes.
And so for being able to actually work with this data and engineer around it, can you talk through some of the architecture of your data platform and the ways that you thought about the prioritization of what am I trying to optimize for, what do I need to be able to deliver reliably, and what are the pieces that we can actually just put some duct tape on and muddle through for now? Well, as a start up, what are the things that you put the duct tape on and muddle through? And the answer is basically
[00:19:56] Unknown:
everything that you can possibly get away from except for, like, the places where the volume is too large. So, the moment I think you cross over, millions or tens of millions or 100 of millions of rows, or elements of data, you're just immediately in the land where non programming approaches don't work. Right? It's it's completely non amenable to the human view of the world. And normal insurance agencies focus on having a single human, a single salesperson who works through a single client set of data and then produces, like, a very small n number of results. But you just really can't do that, and none of the existing tooling works in the industry to support you if you want to go do it the way that we do it. So we have effectively found ourselves in the problem of building a lot of things that are unique in the insurance industry, but would be common in in software, I think, and and in finance, definitely.
I does that fully kind of answer what you were getting at there? Yeah. And also because of the fact that you're dealing with this
[00:21:05] Unknown:
somewhat bespoke area of, of the industry and specific types of data that pertain to it, how much of the data platform tooling have you been able to take off the shelf, and what are the pieces that you had to build from scratch because it was a a custom requirement?
[00:21:22] Unknown:
I think the the only thing that was extremely custom is interacting with insurance carriers, and that's just if there's, like, a couple major vendors there that have really wormed their way in. And, if I have to go read more XML definitions of how to interface with particular insurance vendors. Again, I my my eyeballs will bleed out. But, fortunately, once you're past that point, once the data is collected by the way, I think everyone in the data engineering world loves to spend a bunch of time talking about the beautiful things that happen once you have the data, but, of course, there's just so many ugly and gnarly things leading up to that that produce serious problems in how you actually analyze the data. I've been blessed that almost all of my work with, like, actual hands on keyboard tippity tapping away, has been on my employers have done a great job on all the preprocessing, and I've had the good fortune to just have the relatively standard prediction problem of, okay. Here's millions of rows of slop.
Please go convert these millions of rows to slop into something that we can actually run as a model that sits on our website as a live predictor. And that is actually relatively simple. I think that that is basically a world in which I'm comfortable, you know, we're happy handling it right now all in, Python Pandas and treating the problem as a relatively standard, like, out sample prediction task where your goal is to minimize error. And, you know, there's some complexity in the thought of minimization of error. It's not simply MSE necessarily, but that's that's what happens, like, on the back half.
[00:22:52] Unknown:
And as you were going through the design process and technology selection, what are the things that influenced your decisions? Was it largely based on, well, this is the language that we're using for building the application, so we wanna bias towards that for our data stack? Was it we just wanna be able to get the cheapest thing that does the thing that we need right now? I'm just wondering if you can talk through that kind of build versus buy aspect, and what were the things that biased you towards the tooling that you ended up settling on? Oh, and we're wandering into controversy already, but I'm gonna say,
[00:23:25] Unknown:
a rude thing here, which is everyone thinks that your problems are not that unique, so you ought to be able to buy software that all solves this problem. And you certainly, as a start up, cannot write all the software you're gonna need, so you do have to buy a lot of software. But it just turns out almost all the time that you buy software, it's kind of junk. Just so that that's my opinion on buying software. And you get this really lovely sales pitch about some, you know, cutting edge b 2 b SaaS pitch that's like, oh, yeah. It's just gonna solve everything for for everything you need. And then you go and open the box, and it's it's like it's it's the a scene from Arrested Development dead dove inside. And you open it, and you're like, well, that's kind of on me now. So I think the the view is basically we try and use as much that is standard, that is quality, that is open source across the stack. And then the moment that you run into the, like, the And as
[00:24:29] Unknown:
And as you have gone from the initial idea of, is this even possible? Let me just throw something together real quick that acts as a proof of concept to where you are now, where you have a business that relies on the technology stack that you're running on. How have the overall design and goals of the system changed in that process?
[00:24:47] Unknown:
Yeah. I think the gap between I have a proof of concept and I have a production level product, is so large that most people who are let's say let's say most people in management at software companies or non software companies can't even imagine it. I don't know how many times, you or I have seen, like, a hackathon project where someone threw something together in, like, 16 hours, and it looks great. And then, there's some managerial review, and somebody's like, great. Publish it or whatever. And and the person who's, like, done all this work is just ash faced because they know the gulf between these 2 things is just just months months. And and our story is no different, there.
I can actually remember 1 of the problems we immediately had was our models started just dying in when people land on our website, like, real code is executing to try and calculate these these things. And we had, set up our our our system with certain expectations about the number of workers and the number of threads they would need. And it just we effectively nuked or DDoSed ourself into oblivion at the beginning here. And, you know, what worked great on my machine running an offline model and calculating a couple of things when I, you know, went to sleep, came back in the morning was was really not viable at scale.
[00:26:07] Unknown:
Absolutely. Yeah. 1 1 of my, favorite tropes is, any software that is insufficiently broken will remain in production.
[00:26:16] Unknown:
Yeah. Absolutely. Like, in in the trades, I believe it's, there there you know, there's no such thing as a short term fix. Right? Like, if it's a short term fix and it fixes it, well, it's a long term fix. Exactly. Absolutely.
[00:26:31] Unknown:
You've got a system. It's running in production right now. Obviously, there are sharp edges and skeletons in the closet, and I'm wondering what your approach has been to prioritizing or identifying which pieces of tech debt do I care about, which pieces do I need to care about, and what are the pieces that I'm just going to print pretend that don't exist.
[00:26:52] Unknown:
Yeah. I think all early stage start ups are triaging in the same way in what I imagine the same way that, like, an emergency room is. If you've ever been to the emergency room 1 of the things that I love about going to the emergency room is it is a reminder of how inconsequential your health problems are because you'll go in there and they'll be like, you're feeling quite poorly. And they'll be like, oh, it's a 2 hour wait. And you you're like, 2 hours? I don't that seems like a very long time. And then you see someone walking with a gunshot wound. You see someone walking with, like, a knife sticking out of their back, and you're like, oh, yeah. 2 hours seems pretty reasonable. So so it's a great reminder on the perspective in life that you can have, when when you're having a a pretty bad day usually.
But I think it would take a pretty similar approach. It is clear there are enormous parts of the insurance industry that are basically so reluctant to embrace automation for a variety of reasons, some of which are, like, political, some of which are strategic, some of which are technical. Although frankly, the technical problems are the the slightest ones that we're like, okay. Well, until whatever. Max's day no longer can support the number of clients that we have that are asking us questions about this manual process, we're gonna keep doing things manually. And the overall design of our, company is basically the people whose job it is is to write code are fully async. They can manage their life. They can do they can work the way that they need to work to get the product out. And then there's a portion of the company, of which I am probably part of that portion, which is basically like, I can be interrupted at a moment's notice for whatever crisis is going on. Oh, there's a hurricane bearing down in Florida, so we can't write new policies for 72 hours and like, that that's that's just my day. Right?
And all of those things get dumped into the exception bucket, and the exception bucket in insurance companies is people. It's just, it's people like me.
[00:28:56] Unknown:
And as you have gotten further into this space of the insurance industry, what are some of the hurdles or barriers that you've run into that you couldn't see from the start of the marathon and some of the ways that that have thrown a wrench in your projected road map of, oh, this is what we think we're going to deliver this quarter, and, oh, shoot. That's actually never going to happen because of this thing that is completely bonkers.
[00:29:24] Unknown:
I'm gonna tell you this story, Tobias, that it's truly mortifying if you come from this as, like, a a financial analyst manager. When we originally wrote our price prediction models, 1 of the key assumptions of the model is that you always pay more money for more insurance. And that that seems like a reasonable assumption. Right? Like, as your deductibles get lower or as your coverages increase, like the amount that the insurance company will pay out to you, we said, just we told the model, assume the price is gonna go up in that case. Like, please do never produce negative prices for these kinds of things. And we spent like a week debugging why our accuracy kept getting worse in this 1 particular case where we're like, okay. Why does the model keep insisting that prices should be going down as you're buying more insurance? And so I finally logged on to that particular insurance carrier's website, and I started just, like, manually clicking on the buttons and buying more insurance and actually saw my price going down. I was like, what on earth? Something has gone enormously wrong as far as I can tell in their actuarial model. But the real prices of insurance are not at all what you would expect.
And assumptions that seemed reasonable in the are just completely invalidated. And and at that point, we're just like, okay. Fine. I'll throw up my hands. I don't know what's going on here.
[00:30:45] Unknown:
This episode is brought to you by DataFold, a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. DataFold leverages data diffing to compare production and development environments and column level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. DataFold integrates with DBT, the modern data stack, and seamlessly If you are migrating to a modern data stack, DataFold can also help you automate data and code validation to speed up the migration. Learn more about DataFold by visiting data engineering podcast.com/datafold today.
Another interesting aspect of insurance in particular is that you can break it down to this numerical model of if I give you this much money per month, you will give me this much money in the event of some claim, but there's also a lot of fine print and detail and worked in exception clauses. And there's never a, apples to apples comparison between, insurance providers or even insurance plans, and I'm wondering how you try to tackle some of those aspects of, okay, we can give you a price for this type of insurance, but even though this insurance plan costs the same as this other insurance plan, this one's actually better, or this 1 is maybe a 5% price increase.
But in the actuality, you know, there are aspects of it that make it far outweigh the cheaper plan in terms of what you're all actually be getting out of it. I'm just curious how you've tried to approach that question of, the detailed elements of the policy breakdown for helping people understand which plan they actually want to buy for their life.
[00:32:33] Unknown:
This is true in the extremes, but actually I'm gonna myth bust a little bit here. Insurance for individuals for, like, for personal lines in the United States is a relatively standardized industry, and that's not to say that there are not. Lots of important details. There are lots of there are like 6 or 7 pretty key variables in, let's say, a bog standard auto insurance policy in the US that you really do need to know about. But after that point, it's such a commodity product for most people that that isn't true. And 1 of the things that I feel like the insurance industry likes to continue to propagate as a myth is the insistence that things are gonna be wildly different 1 way or another. You certainly see that in all the advertising that is put out there. And and, of course, that's that's reasonable. Right? Their their mission is to persuade you that it's not a commodity product, that there's significant differentiation.
But just just to draw an example from real life here, when a major disaster happens, you might think, oh, major disaster happened. Good thing I have carrier x who's fabulous. All their adjusters are super well trained. It's gonna work out great for me. There is no way that company x has enough adjusters in your area to actually go handle this crisis. So what do they do? Everyone draws from the shared pool of the traveling adjusters who are who are, you know, spread across all the companies to go and handle these kinds of things, and the net result is the policies are highly similar. And in major catastrophes, the treatment that you're gonna get is is, in my opinion, relatively similar because those the the key people, these adjusters, are are getting drawn from a commoditized pool. And so I guess I would say it's just not true. Policy details are very important, but it is a relatively commodity product, which makes it amenable to this kind of analysis.
[00:34:23] Unknown:
And given the fact that to your point of the the personal insurance is is fairly commoditized and standardized, what are the types of insurance that have been most conducive to this approach to problem solving, and how have you been thinking through the process of bringing your capabilities to a broader variety of insurance types or things that you might want to insure and just some of the elements of the insurance industry that are especially, resistant to this statistical approach to being able to actually manage the, information gathering and information analysis of who who should buy what.
[00:35:06] Unknown:
There was an article earlier this year in The Wall Street Journal that really went against the grain of the narrative that I think a lot of venture capitalists are interested in right now, which is about AI and automation. This article was about the insurance industry, and I I think the headline was the humans have won. And that's basically true for most parts of the insurance industry. For about 20 years, people have been promising automation to just place the tens of thousands of human salespeople who run insurance agencies. And all of those efforts I'm not sure if I should put an asterisk and say, like, maybe almost all of it, but almost all of those efforts have failed. I'm not saying that there's no automation, but it turns out the thing that sells bespoke insurance products are human salespeople on a phone.
But our view is that there are 1, 000, 000, 000 of dollars, about $500, 000, 000, 000 a year in personal lines of insurance that are not really bespoke products. The insurance industry is massive, 1, 000, 000, 000, 000 of dollars, but most of the industry, and actually a lot of the profit, is concentrated in these kinds of bespoke things. You hear these stories of, like, you know, so and so hand model ensures their hands on a Lloyds marketplace. That's never I'm not gonna say never. That's not ending up on, an automated stack of coverage cat anytime soon. I'm not there's no there's no model I can write about those kinds of things because they are extremely unusual. They're just contracts.
But a lot of the industry is a pretty commodity product. You know, in the US, $500, 000, 000, 000 a year of pretty commodity product from personal lines, and that's where we think we can crack the nut. It's not in the areas like, oh, how do we handle a, you know, construction risk for new high rise where they're using, certain kinds of safety gear? That's the kind of complexity that is is really out of our our scope.
[00:37:06] Unknown:
To that point of AI, I'm wondering if there has been any utility from the rampant growth of large language models and being able to apply them to questions of summarization for insurance policies to be able to help with that data extraction and data aggregation piece for those more detailed elements that we were discussing?
[00:37:28] Unknown:
OpenAI wrote a report that basically pinned insurance as, like, the 3rd most likely candidate for automation. 3rd, 3rd, 5th. Some some 1 of the most likely candidates of of, LLMs. I haven't seen it yet. That doesn't mean that someone's not I'm sure I'm sure there are many people working on it, but the problems, if you've ever dealt with LLMs, is 1 of the largest problems. Let's just let's just put put there in order. The the largest and most important problem in insurance is selling insurance, And LLMs are not actually very persuasive salespeople. If you've ever had an LLM, it's never really enticed you to buy. Kinda just came off as an idiot. Right?
So that's problem number 1. Problem number 2 in kind of, like, the support and claims automation. That's actually seems much more reasonable to me, but you're gonna run into the problem there of claims actually aren't that frequent. Right? Like, you sell a lot more policies than have claims, hopefully, and so you are competing against cheap humans. And anytime you're competing in cheap against cheap humans, well, some of the time, the cheap humans win. Right? They're just cheaper than the LOM, and and if they can do the job better, of course. So we'll see. I'm sure progress will be made here. I am not sure that it's gonna come at the pace or at the market size that some people are anticipating.
[00:38:49] Unknown:
Now digging into the failure modes of your work and the ways that things can go wrong given that you are selling insurance to people that they don't care about until they really care about it. I'm wondering if you can talk through some of the ways that you think about the error conditions, failure modes, data quality aspects, and just the, we really need to make sure that this thing really works well piece.
[00:39:14] Unknown:
At the end of the line, when you go and buy with Coverage Cat, there's actually a step in which I actually review every recommendation we make, and I click approve. So right now, like, there's a there's a human at the end that is me that tries to that that makes sure of that. And and there's also reasons for that. Like, you know, we carry insurance to make sure that we don't make a professional liability error. Like, not not only was this the right thing to do for our customers, but this is, you know, somewhat necessary. So so that's how we do it today.
I would actually say we do better than most human aid like, we review policies every day from people who send us what some other human sold them. And those policies are often egregiously flawed because you're in a hurry to buy insurance. You don't really want to understand it. You don't know, and the agent has an incentive to just sell you more. So, like, I see policies where people have half a $1, 000, 000 of personal property in their home insured, and many people don't really have a half $1, 000, 000 of stuff in their house. It's kinda it's kinda hard to have a half $1, 000, 000 of stuff, especially if it's actual cash value, not replacement value. Well, you know, your your $100 bed frame from IKEA in actual cash value, how much is a used IKEA bed frame really going for these days? And so you see these kinds of problems everywhere. In general, if I had to just pick if if we're talking separate from the data engineering crowd, I'm just gonna impart some pearls of insurance wisdom. There are 2 mistakes that I see everywhere in insurance.
When you are poorer, you need less, third party insurance because it's very hard for people to sue you to get 1, 000, 000 of dollars when you have nothing, but you need more first party insurance because you can be bankrupted because the, you know, your your car is hard to replace. As you get richer, please stop buying insurance for tiny little things. Just self insure and, you know, you've your bank account is your insurance net, and that really avoids all the claims hassle. That avoids having claims on your records. You won't see the price hikes from it, blah blah blah. But the thing that you need to worry about is your 3rd party liability.
Suddenly, you're walking around with a target on your back where everyone knows if something bad happens, they can sue you for all that you're worth. And you always wanna have enough insurance that they would rather settle with your insurer than try and, you know, go to a trial court and and and take all your savings. So that's basically what the coverage cat optimizer does in terms of picking the right insurance limits. Those are the that's that's what we specialize in. And in fact, almost all of our clients should be buying less insurance at the low end and more insurance at the high end of, like, what kind of catastrophe is covered.
[00:42:07] Unknown:
As you've been working in this space of working with this data, working with the elements of how is this going to actually have a real world impact, What are some of the most interesting or innovative or unexpected ways that you have seen the data that you're aggregating used, whether it's decision making, whether it's further analysis? Just interested in some of the the the unexpected outcomes of your work.
[00:42:30] Unknown:
I'm trying to think if there was a real unexpected outcome. I think probably the largest unexpected outcome I've seen is many of our clients file insurance claims that they shouldn't have, and they file those because no one's in their corner advising them. Like, just just to pick on a specific example here, like, if you go ask your insurance agent, the person who sold you a policy who represents the carrier, oh, should I file this claim? They're not really gonna give you a sensible answer on this topic. And I've seen clients who have filed, you know, a $200 chip in their windshield claim that has produced literally 1, 000 of dollars over the years of increased costs of insurance for them.
And any reasonable person who knew how prices change when you file claims would have been like, don't do that. Just just replace your windshield yourself. And so we ended up actually building a claims calculator to enable our clients to kind of get a rough sense about how their prices will change if they filed that claim and decide if it's worth it or not. There's so many rules of thumb that people tell you in the insurance industry that are just not grounded in data. And our goal is to try to be that actually data rich provider of those kinds of answers to to to to myth bust, to end the kind of, guesswork.
[00:43:58] Unknown:
And in your experience of building this company, working through these datasets, figuring out how to actually make some useful insight from this morass of, opacity, what are the most interesting or unexpected or challenging lessons that you learned in the process?
[00:44:15] Unknown:
I think 1 of the key advantages of startups is that you can seriously underestimate how difficult the task you're trying to go do is. And when I entered into when we when we started this business, I was like, well, it's gonna be pretty straightforward. Like, you know, we we have written financial models. We know how to do some data gathering. This is not this is not gonna work out. It just turns out there's just so much crap buried there. And if you aren't an incumbent, if you've, like, lived that life, you know that's true. I mean, even I, from from my background, I knew that was gonna be true, but I was willing to put on my horse blinders and pretend that it was gonna be simple and pretend that we were good. I was like, oh, yeah. In a year, we're gonna have a really accurate price model from, you know, millions of good thing. No. It's just, like, it's harder than that. And if you wanna go create your own company or create your own start up, I think maybe there's a lucky few for whom it's just a very straightforward process.
But for us at least and for, I think, most people, you have to pretend it's going to be easier than it actually is because if you knew how hard it would be, you might not step out the door.
[00:45:23] Unknown:
Given your experience of working in this space, taking a very data driven approach to the insurance industry, what are the cases where a purely statistical view of the market is the wrong approach and you actually need to be more detail oriented? And what what are the aspects of the industry that are resistant to scale?
[00:45:41] Unknown:
At the end of the day, you are ultimately producing, as we just talked about, a PDF, And that PDF must perfectly accurately represent the information of your client. You cannot take a statistical view of the number of children your client has. You have have to actually answer things. So, there is this friction in which you transcend from here is very roughly here's the blurry sketch and then the last bit of, like, translate all of the blurry sketch into the final bound policy in PDF, that remains
[00:46:14] Unknown:
not amenable to statistics. That's that's just a that's just a data collection piece in in our view. And as you continue to invest in this area and build coverage cap, what are some some of the things that you have planned for the future of your data stack, your product suite, and some of the thorny problems that you're excited to dig into?
[00:46:35] Unknown:
1 of the favorite things of mine in insurance is how you can mix and match and combine insurance. And I think, you know, bless the people out there that are running 1, 000, 000, 000 of dollars of ads saying bundle and save. But in many cases, bundle and save is a misnomer because, sure, you save $50 for bundling, but the actual cost if you had bought this a la carte would have been 1, 000 of dollars different. And you can actually combine policies in interesting ways to produce 1, 000, 000 of dollars in coverage at much lower prices. And so the thing that I am really excited about is Coverage Cat continues to go forward on the customers who have complicated insurance needs. How can we take a data driven approach to actually optimizing that? And that's a much more sophisticated optimization problem. That's a question of, oh, it's not merely moving the slider on the Allstate auto bar. That's a question of which carriers can comprise the appropriate set of coverage for an individual's risk tolerance and for their net worth.
And we can see tremendous savings there. I see clients who spend tens of 1, 000 of dollars a year in insurance, most of which is unnecessary or could could be produced a lot of that risk mitigation could be produced in a different way. And that's where I'm really, that's that's where the the big opportunity for our clients is.
[00:47:59] Unknown:
And are there any aspects of the work that you're doing at Coverage Cat, the overall space of the insurance market and the available data in ways that it can be used to drive insight and purchasing decisions or any other aspects of this problem space that we didn't discuss yet that you would like to cover before we close out the show?
[00:48:18] Unknown:
Oh, yeah. I think our ultimate goal is really simple. Today, when you buy insurance, and I talk to people about this every day, most people I know open up their web browser, Google, blah blah blah, you know, whatever, auto insurance in New York. And, they click on the first link. That link tells them a bunch of information that is reasonable but not terribly useful. And if they're lucky, at some point in this process, they end up with buying a sane policy. They usually pick the details of the policy themselves. They don't choose the right things. They don't know. I the eventual goal is just, like, Google should do a better job here. Coverage Cat is the right you know, like, in my mind, it's the right way to buy it and that's that we will tell you about the things you should be buying, and we will show you the prices from the many options, and we will optimize for you.
[00:49:10] Unknown:
I hope, you know, 20, 30 years down the line, what we're doing is the way that people actually buy insurance, and that there is not an army of 60, 000 salespeople beating down your door to sell you stuff you don't need. Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:49:38] Unknown:
I think a lot of people have really great ideas about how data pipelines ought to work, but I've seen very little that shows me that we are really actually making it accessible and stable and manageable. It's like it's really like we're building on a foundation of sand in many of these places, and the most ambitious ideas to go change that have either not succeeded or certainly not percolated into the mainstream. Like, when you think about how companies really do data management today, it's probably either a SQL variant or spreadsheets. And that tells me that there remains enormous opportunity for someone to actually create a product that is usable by everyone, that is, enables you to know about how the data you're managing is is working.
You know, I think Microsoft Access didn't succeed, but it had the right idea, and I hope some someone comes up with something like that. And and it's clearly not, you know, these consultancy oriented super complicated, data management providers. Absolutely.
[00:50:48] Unknown:
Well, thank you very much for taking the time today to share your journey into this interesting and perilous space of insurance and being able to make it less opaque. Appreciate the time and energy you're putting into making that a tractable problem for the average person. So, thank you again for that, and I hope you enjoy the rest of your day.
[00:51:09] Unknown:
Alright. Thanks, sir. Bye. You take care.
[00:51:17] Unknown:
Thank you for listening. Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its its community, and the innovative ways it is being used, and the Machine Learning Podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com. Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
[00:51:56] Unknown:
Workers.
Introduction to the Episode
Max Cho's Background and Career Journey
Challenges in the Insurance Industry
Data Transparency in Insurance
Data Platform Architecture
Prioritizing Tech Debt
Automation and AI in Insurance
Handling Error Conditions and Failure Modes
Future Plans for Coverage Cat