Summary
Postgres is one of the most widely respected and liked database engines ever. To make it even easier to use for developers to use, Nikita Shamgunov decided to makee it serverless, so that it can scale from zero to infinity. In this episode he explains the engineering involved to make that possible, as well as the numerous details that he and his team are packing into the Neon service to make it even more attractive for anyone who wants to build on top of Postgres.
Announcements
Postgres is one of the most widely respected and liked database engines ever. To make it even easier to use for developers to use, Nikita Shamgunov decided to makee it serverless, so that it can scale from zero to infinity. In this episode he explains the engineering involved to make that possible, as well as the numerous details that he and his team are packing into the Neon service to make it even more attractive for anyone who wants to build on top of Postgres.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
- Your host is Tobias Macey and today I'm interviewing Nikita Shamgunov about his work on making Postgres a serverless database at Neon.
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Neon is and the story behind it?
- The ecosystem around Postgres is large and varied. What are the pain points that you are trying to address with Neon?
- What does it mean for a database to be serverless?
- What kinds of products and services are unlocked by making Postgres a serverless database?
- How does your vision for Neon compare/contrast with what you know of PlanetScale?
- Postgres is known for having a large ecosystem of plugins that add a lot of interesting and useful features, but the storage layer has not been as easily extensible historically. How have architectural changes in recent Postgres releases enabled your work on Neon?
- What are the core pieces of engineering that you have had to complete to make Neon possible?
- How have the design and goals of the project evolved since you first started working on it?
- The separation of storage and compute is one of the most fundamental promises of the cloud. What new capabilities does that enable in Postgres?
- How does the branching functionality change the ways that development teams are able to deliver and debug features?
- Because the storage is now a networked system, what new performance/latency challenges does that introduce? How have you addressed them in Neon?
- Anyone who has ever operated a Postgres instance has had to tackle the upgrade process. How does Neon address that process for end users?
- The rampant growth of AI has touched almost every aspect of computing, and Postgres is no exception. How does the introduction of pgvector and semantic/similarity search functionality impact the adoption and usage patterns of Postgres/Neon?
- What new challenges does that introduce for you as an operator and business owner?
- What are the lessons that you learned from MemSQL/SingleStore that have been most helpful in your work at Neon?
- What are the most interesting, innovative, or unexpected ways that you have seen Neon used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Neon?
- When is Neon the wrong choice? Postgres?
- What do you have planned for the future of Neon?
- @nikitabase on Twitter
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
- Neon
- PostgreSQL
- Neon Github
- PHP
- MySQL
- SQL Server
- SingleStore
- AWS Aurora
- Khosla Ventures
- YugabyteDB
- CockroachDB
- PlanetScale
- Clickhouse
- DuckDB
- WAL == Write-Ahead Log
- PgBouncer
- PureStorage
- Paxos)
- HNSW Index
- IVF Flat Index
- RAG == Retrieval Augmented Generation
- AlloyDB
- Neon Serverless Driver
- Devin
- magic.dev
[00:00:11]
Tobias Macey:
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end to end data lake has platform built on Trino, the query engine Apache Iceberg was designed for. Starburst has complete support for all table formats, including Apache Iceberg, Hive, and Delta Lake. And Starburst is trusted by teams of all sizes, including Comcast and DoorDash. Want to see Starburst in action? Go to data engineering podcast.com/starburst today and get $500 in credits to try Starburst Galaxy, the easiest and fastest way to get started using Trino.
Your host is Tobias Macy, and today I'm interviewing Nikita Shamkhadev about his work on making Postgres a serverless database at NEON. So, Nikita, can you start by introducing yourself?
[00:01:01] Nikita Shamgunov:
Great being here. My name is Nikita. I'm working, like you said, on on serverless Postgres. It's been a very fun journey. We started 3 years ago and changed on March 1, 2021 with 3 guys in a slide deck. And now Neon is a company with a 100 people and hundreds of thousands of databases on management.
[00:01:24] Tobias Macey:
And do you remember how you first got started working in data? Yes.
[00:01:28] Nikita Shamgunov:
I think I always was fascinated with databases. I started to use PHP MySQL, while, still going to college because I was kinda moonlighting and and trying to make, a little bit of money while while studying computer science back home in Russia. And then my first real job, like, real real job was at SQL Server, which is a flagship database product at Microsoft. Before I joined there, I had no idea what how large scale systems are being built. And I really cut my teeth on database architecture, fundamentals, how to get quality at SQL Server. So that was my first intro.
[00:02:09] Tobias Macey:
Now in terms of the Neon project, can you give a bit of an overview about what it is that you're building and some of the story behind how it came to be and why you decided that you want to spend your time and focus on it?
[00:02:21] Nikita Shamgunov:
I actually thought about Neon while working on SingleStore, my previous company. And SingleStore was designed to be a scale out system, and we had a very ambitious vision of becoming the the system that can run both transactional workloads and analytical workloads and all kinda globally distributed system. Lots of learnings, certainly on, on that project. And then at some point, I I was seeing the rise of Postgres because every single customer or single store, and we had very large customers, had Postgres somewhere. Right? They would put some, like, really large scale workload usually either an analytical workload or a real time analytics workload on single store, and a part of the data feed that was going into that would always come from Postgres.
And so I was just seeing Postgres everywhere. Also at the same time, my prior mentor at SQL Server, whose name is Alex Rubitsky, was telling me about this exciting project that he was working on at Inside AWS. That project eventually launched and became AWS Aurora. And I saw, a very, very fast growth of adoption of Aurora. And with single store, every time we would go into, like, application migrations, that was frankly a shit show. Right? Because you have an application built against 1 engine, and then you're trying to, you know, move the application from 1 engine to another. Turns out there's all these little quirks that prevent you from moving the application over.
And then when, you know, Aurora came out and I read about the architecture, I was like, wow. That's an interesting proposition. It you don't lose the surface area at all. The surface area is exactly the same of the database product, but you have this additional benefits that you get through the separation of storage and compute. So and I started thinking about it, and I couldn't stop thinking about it. It. That was that was an interesting artifact of me. Obviously, I was running your company and I didn't have that much time to to build a side project there, and it didn't make sense to do it inside single store, but I just couldn't stop thinking about it.
And I spent many years in that idea maze where I was like, well, if I were to build a competitor to Aurora, what am I gonna do exactly? How can I take advantage of open source? How can you take advantage of cloud distributions? What does it mean to be a developer versus infrastructure offering? When I left SingleStore, I joined Khosla Ventures. By that time, the idea was more or less formed in my head up to a point, enough of a point to start a company, and certainly not enough to, you know, fully plot the future path of how this is gonna be successful. And and walking in, I told Vinod Khosla, who runs Khosla Ventures and the founder of Khosla Ventures, that I have this idea in me. What do you think? Maybe we can prototype it.
And, Vinod says, yeah. For sure. You know, we we absolutely need to incubate it. How much money do we need? And I said 10, 000, 000, and he said, here is 5. So we got to work. And in this company, we engineered the team. 1 thing you'll learn a bunch of stuff in in venture as well. And 1 of the things Vinod always says is the team you build is the company you build. And in a way, I started to think about Nian not in terms of the plan, but in terms of the team. Who do you have on the team? Who do you need on the team defines what actually the plan is gonna be and what kinda what kinda product this is gonna become.
And so this company was engineered around people who are very post trust native because I knew for sure. I knew 100% that if you diverge and you're not Postgres anymore, you're something else. You know, you go buy PlanetScale, CockroachDB, whatever that might be. Doesn't matter if you speak Postgres protocol. You either Postgres or you're not. And so this company was built around people who contribute to the core Postgres engine. KKL in a congress as a Postgres Committer, and Stas, was a Postgres commune contributor. So that that became the foundation of, of NEM.
Then in the beginning, the only insight was that every successful cloud product inside a hyperscaler has an open source alternative. And lots and lots of examples of that. Right, you know, for Redshift, that would probably be ClickHouse now these days. That's 1 of the more or dot DB, 1 of the more popular open source products. But then, also Redshift had an alternative, a cloud native alternative, which is Snowflake. And Snowflake frankly out executed Redshift. So unbundling a popular database service seems like a good idea. And I was also thinking about GitHub versus GitLab analogy, where GitHub is a cloud product and GitLab is an open source product.
And when you're an open source product that allows you, gives you the right to exist. And I was like, nobody is building an alternative to Aurora. And and and that was strange to me. I even reached out to Alex and said, why don't we Alex Rubitsky, who, you know, in the 1st few years wrote the most code on that project, is like, why don't we incubate this thing together? But I always wanted to stay at Amazon, and I couldn't find other folks that would do a great job on it. And so I couldn't stop thinking about it. And so eventually, we thought about, okay, we're gonna just launch an open source Aurora. But then it was also obvious that the the money is in the cloud, and and building a cloud only product is is what creates a lot more focus as opposed to creating a cloud product and and and an on prem product.
So what we did is we kinda claimed our open source real estate by announcing to the world that, hey. This is this code base, and this is open source Aurora or an open source alternative to Aurora. And the code is open, and the code is under a good license. So, you know, anybody can watch it, anybody can adopt it. But we only focused on building a cloud product that consumes this code but delivers this as a service. So that was kind of the second insight, and and that was the plan from the start to to only be a cloud product. The 3rd and the 4th insight came as we started working on that.
I learned that, you know, there's lots of things you can do with the database technology. You can work on mega resiliency for, you know, multi region deployments. You can work on multi master, you know, increase wide throughput because, you know, database kind of bottleneck on on the amount of rights you can you can send through them today. You can work on analytics. And what would be this defining feature of your product vis a vis what you can get off the shelf from Amazon? And then we realized that serverless was kinda a big deal, and that's when we made a decision to delay our release, and then delayed our release by at least 6 months and ship it serverless only.
And that work only is something that I learned competing with Snowflake, where frankly, single store did too much. We have cloud and on prem. We OLTP and analytics, and that prevented us from from, like, truly competing for what became a big category, right, data and analytics in the cloud. So here, I don't wanna make the same mistake. So so we focused and we said, Postgres only, cloud only, serverless only. So that was kind of the the next big insight. Finally, what we're realizing now, is, well, our user is a developer, and developers have lots of needs.
We asked ourselves the question, why people use Nian versus AWS or Azure or GCP architecture? And the answer was always kinda like, oh, it's easy to use. You push a button and you get it. I think the real answer is that small teams that need to move fast don't have the luxury of having DevOps. And if you're using Amazon, you need DevOps. Right? Because Amazon is infrastructure. Amazon is not a developer platform. When you use GitHub, you don't need DevOps. Right? You know, you as a developer consume that feels super native for you as a developer. But when you use e c 2 or, you know, 200 services on AWS, you feel like it's like Lego bricks on which you build your application, and it doesn't feel like this is built for developer as an end user consumption.
So so that moment was like, oh, this is what it means. Smaller teams can move faster because they don't have DevOps, and they consume this this directly. Once that clicked, we're realizing that those teams need more than just a database. And, if you tune in to Nian, you will find that there is more and more technology we'll be shipping, that is more on a database plus plus database plus more that we'll be shipping.
[00:12:02] Tobias Macey:
1 of the interesting points that you brought up is that Aurora had a lot of interesting capabilities and functionality of being able to provide this serverless experience, scale to infinity. You don't have to worry about provisioning the number type of instances. You just throw data in, and it does what it's supposed to do. The problem that I've seen, though, is that it is not an exact actual Postgres or an exact MySQL. There are enough edge cases that if you are using it for anything even remotely nonstandard, you're gonna hit problems, and you can't just use it as a complete drop in replacement for MySQL or Postgres. And so your insight of saying that if you're going to do this right, it has to actually be Postgres through and through, I think, is very salient and very well thought through.
And so given the fact that Postgres is a very large and diverse ecosystem, a lot of different use cases that it's supporting, a massive number of different plug in types. I'm wondering if you can talk to some of the ways that you're thinking about what it means to be serverless for such a diverse ecosystem. What are some of the ways that you're trying to scope the applicability of Neon so that you don't have people coming to you and complaining that, oh, it doesn't do x, y, or z because I'm trying to use these 15 different plug ins and some of the ways that you're orienting towards that developer experience by removing the operational concerns.
[00:13:39] Nikita Shamgunov:
Well, I think there are 2 questions in 1 here. 1 is, how do you maintain compatibility with Postgres in the, where the reality is that the ecosystem is so deep. So what are you changing with Postgres, and what are you not changing, and what are the, net effect of those changes, with regard to compatibility with the ecosystem. And like I said earlier in the call, the compatibility with Postgres is like paramount. And if you break it, you're on an island. Right? At SQL Server, we used to say 99% compatibility means 99% of your customers have problems. So the compatibility needs to be a 100%.
So from the architecture standpoint, and you can look at the architecture of Nian in our documentation, we we don't hide the fact that we actually run Postgres. Right? We run Postgres in ADM, and we attach that Postgres into custom build storage, and that thing we build from scratch. So the integration point with Postgres with our storage goes on a relatively thin API. You know, at the end of the day, post a storage engine, request pages from disk, and then write transaction log record. I'll call wall on disk, and then update use wall to update those pages both on disk and in memory.
And that's precisely where we interject it. So we said, instead of writing a transaction log record on disk, send it over the network into our service, and then instead of requesting a page or reading a page from disk, read it from our service over an API call. And just just as you see, you know, that allows us to actually not change the engine. Now the reality is, well, you still need to change the engine, because, you know, while this looks very, very good on paper, then Deville is in the details, and then there isn't a pluggable storage engine support at postscripts. So we had to do a little bit of surgery. The important bit is the amount of that surgery is not huge, and that allows us to keep the compatibility.
So that's how you attach Postgres to our storage. Now what about the serverless bit? So the way serverless works is we run Postgres in the VM, and, we change the size of that VM and add more memory and CPU to the VM based on the workload and then remove memory CPU from the VM if the workload doesn't require as much memory of CPU. So also on paper, that that sounds wonderful and easy to do. Well, turned out that there's a lot going there. So so we had to build a lot of the VM expertise internally at NEM to support that. We also thought about running compute nodes and containers.
Well, they're not really isolated, and those risks can be hacked, or broken out of the process, which is not ideal. So we needed that security boundary. In addition to that, we wanted our VMs to be able to to change hosts. And Postgres is a stateful system while state lives in our most storage, but even the connection to Postgres is stateful. You interact with Postgres by establishing a TCP connection. So if you move your your container from 1 VM to another, you break that connection. VMs, you can actually log migrate, and even the TCP connection remains. So that was kind of the the second reason for us to use VMs. And now there's a ton of VM expertise because we, you know, run 100 of thousands of them all at the same time on on our platform.
So, basically, the answer, how do you deal with the, you know, massive ecosystem of Postgres is, well, through that architecture, we don't break out compatibility because the engine itself is still Postgres. It's just for swapping out storage, from under Postgres, and the API to storage is so small, that it doesn't impact app compatibility. So developers don't suffer, but do they thrive? That's another question. So and 1 of the things that developers need to thrive Well, some of this stuff is silly. You you know, you go and, launch an RDS instance. It's not connected to the Internet.
And if you run CloudFlare Workers, it's a gigantic pain to connect your application to the database. You know, certain things don't support TCP connections, so that's why we launched our serverless driver. Postgres doesn't do very well with lots of connections, and therefore, there are systems like poolers, our PG balancer that allows you to scale number connections to Postgres. So part of the value is just packaging all of that and make it as stupid simple to consume, never run out of connect connections, and never do operations that you have to do without the systems adding infrastructure to just the core database.
But then there's more than that. Every application that a modern small team does. Right? Bigger teams, they have DevOps teams, SREs. They they stand up their own CICD pipelines. But if you're taking things off the shelf, if you're taking GitHub, if you're storing your front end on resell, if you're running your software development life cycle by by sending PRs and and run tests and GitHub actions, Turns out database doesn't play nice in in that, and we made it play nice. So we have database previews, which is achieved through the technology with whole branching. We have the ability to create those previews based on every PR in GitHub, and now we're adding more and more features that would integrate with a deeper JavaScript ecosystem. So when you build apps and you need things systems like auth or payments or storage, that's also trivial to do on the end.
So all of that kind of falls under the umbrella that you wanna ship your applications faster. That's really the whole acceleration movement, which is, you know, mostly driven by AI. But really, by developer productivity, you can crank those apps much faster now, and for that infrastructure needs to support them. That that all of that contributed to the vision of of that we have at Nian.
[00:20:17] Tobias Macey:
In terms of the engineering that you had to do on Postgres, as you said, Postgres is known for being very pluggable, but the storage engine, at least to date, has not been 1 of those plug in interfaces, though my understanding is that that is changing. Wondering how you have had to approach the rework of that Postgres engine to minimize the footprint of your changes while maximizing the capabilities that you're enabling and some of the ways that the scope and goals of your work on Postgres and Neon have changed from when you first came up with this vision of what you wanted to build to where you are today, where you have a real world production system that people are using every day?
[00:21:08] Nikita Shamgunov:
That's honestly, it's not the hardest part. The the specific work that's happening on the Postgres engine is whatever we can push into the extension, we push them to the extension, and then the rest we we forked. Right? And the way we forked it is we know that this is gonna make it into the core product, either in in this form or once the pluggable storage engines will be introduced. And the amount of changes that that that we need in the core engine are so small that it's trivial to merge them as the new version of Postgres shows up. So I don't know if it's gonna be Postgres 18 or 19, but by that time, I don't think we're gonna have any differences between Postgres 19 and Postgres that we run on the platform, and all of that will go into the pluggable storage or extension API.
I think the more interesting question is, like, where do we spend time? Like, where where does where's the innovation? And the innovation is at the e at the both at the bottom, what what lives from under Postgres that, you know, enormous amount of work that we did by building our, storage subsystem that is fully elastic, multitenant, integrates with s 3. We can run it globally, around the world. That's kinda a marvel. The size of that project is similar to that of, like, maybe pure storage. It would be a comparative, for us, except for Pure Storage and Appliance and and and we are cloud service. And then, another piece of work is above the database.
So not only we made it serverless, what via that VM technology, but we also put a very nice developer veneer on top. And that goes into, you know, 1st, it's serverless, 2nd, consumable from HTTP. Now it's pluggable into, like, Next JS, all this, like, modern JavaScript framework, supports software development life cycle, integrates with Vercel. There's, like, thousands of people using our Vercel integration and and and you connecting database previews with Vercel previews and the stuff that is coming down the pipe, around authentication, around storage, around payments, kinda like more of a backend as a service platform, not just the database.
And I think that's that's the right direction for us. While the database itself is very, very valuable, I think fundamentally, we're delivering in a shorter cycle speed to developers. And our slogan is ship faster with Postgres, so that's why we we have to take over more of the app to allow our users to ship faster, and that's where a lot of innovation is coming into.
[00:24:03] Tobias Macey:
On that note of the branching capabilities, obviously, that maps more closely to the ways that developers think about developing and deploying and debugging features. How has that capability being pushed into the database changed the way that development teams approach their iteration cycles and the ways that they think about actually managing their workflows and debugging capabilities?
[00:24:31] Nikita Shamgunov:
I think it all starts with a specific pain. So let's start with the pain. What's the pain? Today, if you run a production system and you wanna send up a staging environment, you need to move data from the production system into the staging environment. I'm not even talking about how fresh this data is. I'm just talking about give me a snapshot as of, I don't know, yesterday or today, a few hours ago. Give me the snapshot of that data in my staging environment. Well, turned out, for whichever reason, it's a hard pull. Right? It's not that easy to do. Now I have my staging environment, but I'm I'm sharing that staging environment with my whole team. Let's say my team has tens of people, maybe hundreds of people.
They all need a staging environment and they all change in schema because they're building the app. Now they conflicting on both their resources, but then the other centralized resource is just the state of that database. So you can't have a but if you wanna have a staging environment on the per developer, and God forbid, they also test in performance, now you you have hundreds of copies of that. Not only hard to manage, it's also inefficient from the cost standpoint. Now imagine an alternative thing. Let's just say it's trivial to create a staging environment by creating a branch of the production environment. From there, you may or may not, small teams don't, but larger teams definitely do mask or override all the PII data. But then once you have that staging environment, how can you have developers, you know, create developer environments without breaking the bank? So each developer environment shouldn't cost you very much.
And then but still allow you to run performance tests if you want. And then the other 1 is how do you as you develop features and they all conflict on the database schema, how do you make sure that you resolve those conflicts? And this whole thing plugs into your CICD pipeline. The the fundamental primitive that we have is database previews, which we call branches today, but we actually change in that language. We're gonna call it previews everywhere. And when you create a preview, it gives you a full copy of your data, data and schema, and then it's isolated. So for for developer, it's yours.
Underneath, storage does this smart copy and write thing, where creating a copy, is 0. Right? So it's very quick. And then compute is just separate. Right? So it's a different VM that runs Postgres, and that's your compute. So that's the definition of separation of storage and compute and taking advantage of that architecture here. Now in your developer, you can do whatever you want. You can change data. You can change schema. You can test performance. You can drop indexes, gradients, whatever. But then you want to roll these things forward into first stage and environment, and then eventually production environment. What we've discovered is that people don't really care about the changes in data. As a matter of fact, the the data changes in the dev environment should not propagate it all the way to the production, But the application depends on the schema, so schema has to to migrate forward. There are lots of tools that help you with schema migrations.
Those are called ORMs. Things like Prisma, like Drizzle, Type ORM, and whatnot. And we're just plugging in into that workflow. So so we we're thinking very hard both of what is our place on the sun and what we should, you know, grab from the ecosystem and be orthogonal to to that. So ORMs run migrations within the context of of 1 database, but, certainly, they're it's not in their power to generate database previews and give you this fancy forking capability. So that's on us. But then we package it all such that it's trivial to set up the, the software development pipeline and and and life cycle, where for every feature, you go to staging, create your development branch. If you wanna create a sing, 1 every single time, you can just refresh your dev branch from whatever is the current thing in staging. Maybe you wanna mid staging, just do it, directly from production, which is totally fine as well. Develop your feature, send the PR, and from that point on, we got it. So that really speeds up the cycle, to be honest, and, we kinda super excited to see our customers, taking full advantage of of that technology.
[00:29:01] Tobias Macey:
With the separation of compute and storage, you're creating another hop for that data to flow. And I'm wondering some of the ways that you are thinking about the impact on latency, the impact on reliability, and the ways that you're engineering around that problem to get the best of a fully integrated stack of Postgres where it's all running in 1 unit. But the scalability and in in terms of both compute and storage and pricing of being able to actually separate those tiers and the additional layers that you've had to work in to be able to mitigate those latency or performance impacts?
[00:29:38] Nikita Shamgunov:
This is a very fair question. The important thing to understand is that if you run a highly available environment with your classical deployment when you have 2 or 3 nodes, there is a network hop there anyway. So when you ride a when you ride into, you know, the master node to the primary node, the transaction is then over a network hop sent to a a replica. And those are, quote, quote, synchronous replicas. So that write needs to be acknowledged by the replica, and only then you can acknowledge the transaction that you sent to the primary. So there is an, in highly available environment, the hop is already there. If you run Postgres on an EBS node, well, EBS node is network attached as well. So we're we're not really actually, we do, but, like, at the high level, it's roughly the same number of hops. While the reality, there's a Paxos protocol that we use for reliability when we send the log record into our service that's called Safekeepers.
So that have multiple hops to to persist the the record in the access protocol. But it's not like you can avoid network hops altogether in some other architecture you can't. The latency is fundamentally becoming the latencies and throughput are becoming roughly the same. And roughly, there's still a bit of a haircut that we're taking on on latencies. But in return, we're giving you infinite IO throughput. Right? Because our source is multi tenant, and, you know, we can request as many pages as you want. So that's the trade off, and specifically works super well for much larger databases. And for small databases, performance usually is not a problem. So that's the answer for for the question of, so what do you how do you deal with a network cop? Are you strictly worse? And the answer is, well, not really. You have those network cops anyway. Another aspect of postgres from an operational perspective that anybody who has run it for long enough has gotten bitten by is the upgrade process where you have to
[00:31:38] Tobias Macey:
deploy the new node, but you have to keep the old version around to be able to do the upgrade of the storage engine, and it's always this complicated dance. And I'm wondering how you're thinking about removing that pain for the end user and some of the ways that you, as a platform operator, are addressing the automation and scalability of that upgrade cycle?
[00:32:00] Nikita Shamgunov:
As coming from SQL Server, and it's been 15 years since I left this SQL Server, the, the fact that it doesn't have online upgrades bewilders me. And the way that the upgrade process is set up in Vanilla Postgres is frankly strange. SQL Server, you just restart. You're like, you know, shut down the old binary, start a new binary, point to the data, location, and then it just upgrades on the spot in place. And, the SQL Server team makes sure that the upgrade upgrades never fail. They just, like they kinda guarantee that this is the case. Here, you have to do a bunch of dance to upgrade a Pogo instance, but we just treat it as a feature. By the way, we don't have that feature yet. But this feature is under development. You know? It's not difficult, but think about it. You know, we run a cloud service. There is a playbook of how to upgrade Vanilla Postgres. We apply in that that playbook for for our instances. It's trivial to us to stand up a particular version, of Postgres, in in that micro VM that attaches to storage. Of course, we need to do a bunch of manipulations so that storage is in the right format, so you can attach the next version. Yeah. It's a feature. We'll build it. It's not there yet.
[00:33:14] Tobias Macey:
And also from the fact that you are focusing on the developer community, how much does version factor into their end user experience of, oh, I wanna run Postgres. Is it okay? Well, which version do you want? Or is it just, okay, here's the latest, and we'll make sure you stay on the latest? We debated that. We let people choose the version, the post test version
[00:33:35] Nikita Shamgunov:
today. I was actually advocating to not. I was saying let's just run the latest version and upgrade ourselves, but then we didn't have the upgrade feature for a while, and we still don't have it. It's coming. So we landed somewhere in between. So when the new Postgres version shows up, the default, Postgres that we spin up is the latest version. We don't upgrade automatically, and we'll let people choose up to 2 2 versions back. And so far, the architecture of our storage allows us to do that. Again, it's a testament to kind of the the level where we plugged it. So we plugged in at the page level and pages don't care about the version. So so that all works. I think there are benefits to just being on the latest version. I just lost that argument when we were introducing that feature, but we haven't been bitten by that much. And we haven't been bitten by this much is because Postgres is fairly disciplined and and regimented in how it releases. It releases once a year.
Not that much stuff changes. Developers, for the most part, don't care as much about this being we're in this version or that version. You know, every now and then, some good developer features like JSON showed up and, you know, developers care about those. Otherwise, they just like they just it's like Linux. Right? Linux kinda works, you know, this version or that version, only the operator really cares, But the the end user doesn't care as much.
[00:35:03] Tobias Macey:
On that note of developer focused features, the topic that has sucked all the oxygen out of the room for everything else in tech is AI and generative models. Commensurate with that is the rise of vector databases. Postgres has the PG vector extension. As somebody who is running a platform as a service for Postgres, what are some of the ways that you're thinking about the utility, the messaging around, and the impact on your business of PGVector and the ways that it incorporates with the Postgres ecosystem?
[00:35:38] Nikita Shamgunov:
Oh, it's been huge. We're actually contributing to PGVector. There's a nice story there. Keiki, I think, is the number 2 contributor. Still much smaller than the creator of pgevector, Andrew Kane, but nonetheless number 2 contributor. We found a way to improve PG vector a year ago, and, we realized that there is, you know, this index polish NSW. We thought it should be in p g vector. We didn't have a way to contribute to it, and then we built an alternative extension to VPG embedding that demonstrated material improvements for the IVF flat implementation of the index that pj vector had while still has it. Now you can choose. Once we showed the science, Andrew started to work on HNSW and, introduced HNSW PG vector. He's done a great job, and that basically, prompted us to retire PG embedding. And we took all our knowledge that we've collected by building PG embedding.
And what was applicable, we contributed back to PG vector. So, so that was our experience. From the, you know, business perspective, oh, it's wonderful. It's wonderful that this thing is there. We obviously support it. We're contributors. We do things also. Our architecture makes it better to run PGA vector on Nian versus, other platforms. And specifically, when Nian builds an index, and this is a very heavy compute operations because, you know, you do this a lot of the spectrum math as you, you know, create that index. So super compute heavy, super memory heavy as well. Neon can temporarily give you more compute and memory on demand and then shrink it back down. So you don't need to commit to very large instances ahead of time when you when you use DG Vector. So so that was great.
People build AI apps. Each AI app needs stuff. Right? 1 of the biggest things, what makes an AI app an AI app, you'd talk to an LOM. Well, that's not us. But when you build a rack application, you do need a vector database, and then the rest of the plate, of the application chooses Postgres anyway. So so that's what we do with Neon, and so far, that's been working great for us. I think there's gonna be more and more demands, and especially as we add more developer features to the platform in addition to just the database, we'll see more demand of having AI relevant features.
[00:38:08] Tobias Macey:
We have a bunch in the pipeline, and and we'll be announcing them kinda soon. Another element that we've touched on throughout is the fact that everything you're building is being released as open source and permissively licensed. I'm wondering how you think about the relationship between the open source code and your business model and the overall sustainability of both.
[00:38:30] Nikita Shamgunov:
So we're exposed to hyperscalers. I don't think we're exposed to anybody else. So in order for you to run a service like Nian, you need to have several pieces of the expertise. 1 is, well, you need to understand what's written in the code. It's very scary to run somebody else's systems code. And if for an operational database, if there is a bug, you need to fix the bug. So you need to build that expertise. You also need to, you know, stand it up, set up all the absorbability, upgrade systems, like, basically, it's like set up processes that that allow you to run it well, and then you need to build a a team of committers that touches every part of the stock.
So for a start up, next to impossible. For a large company like Amazon, Microsoft, Google, it is possible. It's also possible to build this whole thing from scratch for, frankly, for Amazon, Microsoft, and Google, and Amazon has already done it with Aurora. Microsoft is a little behind, and that's why we are actually partnering with Microsoft. And then, you know, tune in to some announcements. And then Google has a project called AlloyDB, which, I think just like behind me with with the stuff that we can do. So I don't know. It is possible to to, like, quote unquote steal it, but only a handful of companies actually can.
In the US, it's Amazon, Google, and Microsoft, We're partnering with Microsoft, and Google and Amazon has already have already done it. So I think we're good.
[00:40:09] Tobias Macey:
As you mentioned, your previous company was another database company. I'm wondering what are the lessons that you learned in the process of building and growing single store previously known as MemSQL that have been most useful to you in the work that you're doing on Neon?
[00:40:27] Nikita Shamgunov:
The first 1 is focus. Right? We did too much at single store. You know, it's north of a $100, 000, 000 run rate business. So, like, we we didn't fail, but we didn't take it public yet, and it's it's been some time. So and if you kinda zoom on the on the reason is we did too much. We were on prem when we were in the cloud. We're supporting operational workloads and analytical workloads, and the the problems were different in each 1 and shared nothing architecture, either both for analytics and and, and operational workloads breaks compatibility with the the mothership. For us, the mothership was MySQL.
We didn't use MySQL code because it's GPL, but we, you know, use MySQL protocol and syntax, and then turned out all the subtle bugs and the compatibility is something that I have a lot of scars on. So we we certainly fixing that in with with Nian. We're we're not breaking compatibility. And then on the analytics side, well, cloud and object stores was something that we ignored for a while and then eventually caught up, but that was kinda too late. So I think the big 1 is focus and then driving very, very hard towards becoming the default. And maybe that will take some time, but for the outside observers and then later for customers and partners, it's very, very clear where you're going and where we want and we want to become the default development platform for Postgres, and therefore, our architecture, marketing.
So if you do all the top line right, your your technology is very, very solid. Your positioning is very, very solid. Your developer experience is solid. Your design is solid. Bottom line, what kinda follows, and that's our intention. So I I think at single store, we we had a lot well, I personally had a lot more energy. I lived in the office and slept next to the servers, but, I I didn't I lacked that maturity and focus, which, I'm I'm bringing it here at Neon.
[00:42:43] Tobias Macey:
In the work that you're doing on Neon and the ways that you're seeing people use it for their own use cases, what are some of the most interesting or innovative or unexpected ways that you're seeing Neon used?
[00:42:54] Nikita Shamgunov:
The stuff that we didn't expect is, and people do it a lot, is to using, 1 1 database, 1 instance per tenant. They're like, well, they're kinda cheap. They stand up in 22 100 milliseconds. I'm just gonna run a full blown database servers for full postgres for 1 user. And we didn't expect that. Now there's, like, companies, they they run fleets of, of instances, and it works actually really well if you have uneven consumption on a per client basis. Have a long tail of customers, they barely use it. Okay. Well, you're basically paying 0 for those with Nina. And then some using quite a bit, and for that, you need elastic compute.
So it allows you to kinda right size your usage very well. The second thing that we didn't expect at all, is that people run what if scenarios with our branching capability. They're like, okay. Every customer that it's exploring and that there's specific financial planning product, they're, exploring the impact of certain changes. Oh, well, we'll create a branch and then go hog wild on the branch, change data, you know, reads, writes, whatever, and then compute the final result that gives you an answer if what if scenario is successful or not. And then successful, you proceed. If not, you throw out the branch. That was another unexpected thing that I I I didn't even know that scenario existed. The rise of our serverless driver was another surprise to me. It turned out JavaScript developers don't know what a connection is and what a socket is, and I think it's great, actually.
Like, nobody needs to know. So so, you know, JavaScript engineers consume me and using our serverless driver that allows you to to, you know, query me in over HTTP. That was another interesting surprise. What else? Yeah. That those are probably the 3 most most interesting ones. Pg vector caught us by surprise as well Very quickly as we launched EMP, we were like, do you have PG vector? And and so that became Con X standard.
[00:45:10] Tobias Macey:
And in your experience of building the Neon product, scaling the business, what are the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:45:20] Nikita Shamgunov:
I think we'll learn to automate a lot here. So we we actually don't scale anything with people. We're scaling everything is scaled by technology. So for example, there's a feed on my Slack for new customers upgrades. You know, people start with the free tier. When they upgrade, I get a notification on the feed. Everything is there's telemetry on everything. And then there's a data team that is very busy, producing that that works for us. We we make the decisions based on data, and we think about the world as kinda like this real estate. And we know we have good products, so we just need to claim that real estate in the world and then measure everything.
So that that I think was also a difference between operationally running Nien versus running single store. Also, Nien is fully remote that allows us to tap into, unbelievable talent around the world at the cost of the communication overhead. So I still don't know which 1 I like better. It's very fun to go to offsites and then, like, meet the team and then all these people who who do incredible work and then seeing them in person. And at the same time, you know, you do pay the communication overhead for people working from home. So, yeah, those are those are the learnings.
[00:46:51] Tobias Macey:
As you were talking about the data that you're collecting, the dashboards that you're building, obviously, to run a business, you need a database. It makes sense that you would use Neon internally as well. What are the cases where you're actually not using Neon and you need to turn to a different data engine?
[00:47:08] Nikita Shamgunov:
We have 2 more data engines at least, maybe more. 1 is powering our Grafana dashboards. And, for observability, you need an observability engine. And then we use Snowflake for all the reporting. We're we are at 60 terabytes of data in Snowflake, and this number is shocking me. Like, how come tiny Neon generated so much data? But, you know, we are where we are. Post's just not good for for for that. It's interesting that there's more and more work going into Postgres to support data and analytics scenario. It's gonna be a while until it's gonna be like a full fledged data warehouse.
But think about it. Postgres is like Linux. It's a commoditization for us. And a vectorized column store query processor is the future at the end of the day. You know, SQL Server and Oracle have those. So we'll, we'll have them in posters too, and that's coming. And integration with the data lakes is coming as well. There's already plugins that people discuss on Hacker News that that provide such functionality that allow you to create parquet. And in the future, you know, it's gonna be iceberg integration, Delta parquet integration. So all of that has come into the platform.
Where it stands today though for analytics, it's good for small scale. It's not very good for our scale. And for that, we we use Snowflake.
[00:48:32] Tobias Macey:
The cases where Postgres is not the right choice is something that many people have already discussed in various contexts. But for the case where Postgres is the right choice, when is Neon not the right way to run Postgres?
[00:48:47] Nikita Shamgunov:
There are, well, there is the meme walking or going around where I just use Postgres. And I think it's it's great for us. I think it's great for the industry. Again, I think the the era of lots and lots of database engines, that are built for purpose is coming to an end. And you certainly still need, you know, a data warehouse or a data lake, and history will tell if actually all you need is a data lake you need a data warehouse as data lake. We'll see. And then you need an operational database, so that's process. Then there's, like, all the other things that that you potentially need as well.
I think over time, they will all go 1 way or another. Meaning, they will either be part of the data warehouse or they will be part of an operational database. If operational databases and and analytical databases is gonna be 1, that I don't know. I tried, with single store, and again, we scaled this past close to $100, 000, 000 in run rate. I don't think we we had enough of an industry impact to say you just need 1. Maybe. Maybe 1 day. But I still don't know even, you know, after 12 years of a single store, I don't know if it's 10 years out or 20 years out. It's certainly not 2, 3, 4 years out because not only you you need to build technology, you need to change how people build software, and that's like a toll order.
So I would say don't use Postgres for large scale analytics today, and don't use, a data warehouse to power your operational apps for OLTP. In between, you can decide, you know, which way it goes. You have something in between. Everything else kinda will will will be pulled in into 1 of those 2.
[00:50:43] Tobias Macey:
And as you continue to invest in Neon, what are some of the things you have planned for the near to medium term or any projects or problem areas you're excited to explore further?
[00:50:54] Nikita Shamgunov:
Oh, there's a ton. So more clouds for sure. So I think we want to be the default post with software in everywhere in the world, and, we're gonna launch another cloud this year. So I'm super excited about that. That's 1. We're gonna add more developer features. We're gonna have we're gonna make it much easier to build off payments and storage and manage off payments and storage with Nian. And we'll do it with with some partners. So super excited about that as well. We're gonna launch your GitHub app, which will make a much tighter integration for you with CICD and, automatic creating of previews.
We're gonna start bridging some of this, not replacing, but integrating me in with the data lakes. So, that's another super exciting part. And then we're gonna launch more platforms, meaning it's not our platforms, but other platforms that use Neon as the default database provider. So, yeah, that's a lot. So so I'm excited to pull all all of that off and then launch well with high production quality. So I'm looking forward to all that.
[00:52:07] Tobias Macey:
Are there any other aspects of the Neon project and the business that you're building around it or the Postgres ecosystem that we didn't discuss yet that you would like to cover before we close out the show?
[00:52:18] Nikita Shamgunov:
1 thing that is not obvious to everybody is how actually a few people moved the Postgres project forward. And there are certain amount of aging that's happening in the core contributors to Postgres. So I think what would be very useful for the industry, not just for us, and we are contributing in a small way. We have a Postgres team, and Neon engineers contributing to the the core Postgres project even in places where not kinda obvious how that benefits Neon outside of just, like, well, post this gets better. So we do some of that work, and Hickey continues some of that work, and he would use patches.
Right? The industry should train more people who are Postgres kernel engineers because of that aging problem. And the the absolutely top contributors to the Postgres engine are now in their fifties and sixties. So would be nice if more systems engineers from around the world, younger system engineers started to contribute to Postgres. This is called for engineers and also called for the industry to sponsor this work. And the best way to sponsor this work is if you have a high dependency on Postgres, if you you're running lots of Postgres instances in production, it's not that expensive for for the big business to to have some of those engineers contribute to the post risk kernel.
[00:53:48] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:54:06] Nikita Shamgunov:
Well, can I talk about AI?
[00:54:09] Tobias Macey:
Absolutely. I I think
[00:54:12] Nikita Shamgunov:
well, we stood up a data team at Nian, and even for a small company like ours, it wasn't easy. I think there should be an AI data engineer that, can kinda, like, put it all together for you and describing that problem in a, very broad way because I don't wanna lead to the answer. Right? This is definitely not text to SQL. Text to SQL is like a tiny piece of that problem. The problem is I don't have a data practice. I stand up this thing, and that thing figures out and stands up the data practice for me, and acts like a human, and that's tricky. But I think it's possible because now we see those. These systems like Devon from Cognition Labs, you see all this, like, AI engineer, TypeWork. We see magic dot dev, like and and stuff that people are showing is quite magical. So you're like, okay. Well, it's coming.
I think that's 1 of the things that the the data management is missing. You can go further. You can say, well, a data warehouse is a gigantic calculator. Right? And, it's a gigantic calculator, but in order to take advantage of that calculator, you need to, like, really organize data. Put it into tables, columns, obsess about the schema, understand this has a semantic meaning to it. But imagine a gigantic brain that you can just, like, shop data in, and that thing makes sense of that data, and then answers business questions. So I don't know what this means to the future of data warehouses, investing existed.
Now again, I'm, like, thinking a little bit far ahead on this, but if we dream a little bit, then we we may find unusually different architectures for for data and analytics that are fully AI driven. And not just AI on top of a data warehouse, but maybe changing the the architecture of the the whole data warehouse. But, you know, we'll see.
[00:56:28] Tobias Macey:
It's definitely a very interesting future that I'll be excited to see how it develops. So thank you very much for taking the time today to join me, share the work that you and your team are putting into the Neon project. It's a very exciting project or very exciting product, so I'm excited to see the ways that it continues to develop. Thank you again for all the time and effort that you're all putting into that, and I hope you enjoy the rest of your day.
[00:56:54] Nikita Shamgunov:
100%. Thank you so much.
[00:57:02] Tobias Macey:
Thank you for listening. Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com. Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end to end data lake has platform built on Trino, the query engine Apache Iceberg was designed for. Starburst has complete support for all table formats, including Apache Iceberg, Hive, and Delta Lake. And Starburst is trusted by teams of all sizes, including Comcast and DoorDash. Want to see Starburst in action? Go to data engineering podcast.com/starburst today and get $500 in credits to try Starburst Galaxy, the easiest and fastest way to get started using Trino.
Your host is Tobias Macy, and today I'm interviewing Nikita Shamkhadev about his work on making Postgres a serverless database at NEON. So, Nikita, can you start by introducing yourself?
[00:01:01] Nikita Shamgunov:
Great being here. My name is Nikita. I'm working, like you said, on on serverless Postgres. It's been a very fun journey. We started 3 years ago and changed on March 1, 2021 with 3 guys in a slide deck. And now Neon is a company with a 100 people and hundreds of thousands of databases on management.
[00:01:24] Tobias Macey:
And do you remember how you first got started working in data? Yes.
[00:01:28] Nikita Shamgunov:
I think I always was fascinated with databases. I started to use PHP MySQL, while, still going to college because I was kinda moonlighting and and trying to make, a little bit of money while while studying computer science back home in Russia. And then my first real job, like, real real job was at SQL Server, which is a flagship database product at Microsoft. Before I joined there, I had no idea what how large scale systems are being built. And I really cut my teeth on database architecture, fundamentals, how to get quality at SQL Server. So that was my first intro.
[00:02:09] Tobias Macey:
Now in terms of the Neon project, can you give a bit of an overview about what it is that you're building and some of the story behind how it came to be and why you decided that you want to spend your time and focus on it?
[00:02:21] Nikita Shamgunov:
I actually thought about Neon while working on SingleStore, my previous company. And SingleStore was designed to be a scale out system, and we had a very ambitious vision of becoming the the system that can run both transactional workloads and analytical workloads and all kinda globally distributed system. Lots of learnings, certainly on, on that project. And then at some point, I I was seeing the rise of Postgres because every single customer or single store, and we had very large customers, had Postgres somewhere. Right? They would put some, like, really large scale workload usually either an analytical workload or a real time analytics workload on single store, and a part of the data feed that was going into that would always come from Postgres.
And so I was just seeing Postgres everywhere. Also at the same time, my prior mentor at SQL Server, whose name is Alex Rubitsky, was telling me about this exciting project that he was working on at Inside AWS. That project eventually launched and became AWS Aurora. And I saw, a very, very fast growth of adoption of Aurora. And with single store, every time we would go into, like, application migrations, that was frankly a shit show. Right? Because you have an application built against 1 engine, and then you're trying to, you know, move the application from 1 engine to another. Turns out there's all these little quirks that prevent you from moving the application over.
And then when, you know, Aurora came out and I read about the architecture, I was like, wow. That's an interesting proposition. It you don't lose the surface area at all. The surface area is exactly the same of the database product, but you have this additional benefits that you get through the separation of storage and compute. So and I started thinking about it, and I couldn't stop thinking about it. It. That was that was an interesting artifact of me. Obviously, I was running your company and I didn't have that much time to to build a side project there, and it didn't make sense to do it inside single store, but I just couldn't stop thinking about it.
And I spent many years in that idea maze where I was like, well, if I were to build a competitor to Aurora, what am I gonna do exactly? How can I take advantage of open source? How can you take advantage of cloud distributions? What does it mean to be a developer versus infrastructure offering? When I left SingleStore, I joined Khosla Ventures. By that time, the idea was more or less formed in my head up to a point, enough of a point to start a company, and certainly not enough to, you know, fully plot the future path of how this is gonna be successful. And and walking in, I told Vinod Khosla, who runs Khosla Ventures and the founder of Khosla Ventures, that I have this idea in me. What do you think? Maybe we can prototype it.
And, Vinod says, yeah. For sure. You know, we we absolutely need to incubate it. How much money do we need? And I said 10, 000, 000, and he said, here is 5. So we got to work. And in this company, we engineered the team. 1 thing you'll learn a bunch of stuff in in venture as well. And 1 of the things Vinod always says is the team you build is the company you build. And in a way, I started to think about Nian not in terms of the plan, but in terms of the team. Who do you have on the team? Who do you need on the team defines what actually the plan is gonna be and what kinda what kinda product this is gonna become.
And so this company was engineered around people who are very post trust native because I knew for sure. I knew 100% that if you diverge and you're not Postgres anymore, you're something else. You know, you go buy PlanetScale, CockroachDB, whatever that might be. Doesn't matter if you speak Postgres protocol. You either Postgres or you're not. And so this company was built around people who contribute to the core Postgres engine. KKL in a congress as a Postgres Committer, and Stas, was a Postgres commune contributor. So that that became the foundation of, of NEM.
Then in the beginning, the only insight was that every successful cloud product inside a hyperscaler has an open source alternative. And lots and lots of examples of that. Right, you know, for Redshift, that would probably be ClickHouse now these days. That's 1 of the more or dot DB, 1 of the more popular open source products. But then, also Redshift had an alternative, a cloud native alternative, which is Snowflake. And Snowflake frankly out executed Redshift. So unbundling a popular database service seems like a good idea. And I was also thinking about GitHub versus GitLab analogy, where GitHub is a cloud product and GitLab is an open source product.
And when you're an open source product that allows you, gives you the right to exist. And I was like, nobody is building an alternative to Aurora. And and and that was strange to me. I even reached out to Alex and said, why don't we Alex Rubitsky, who, you know, in the 1st few years wrote the most code on that project, is like, why don't we incubate this thing together? But I always wanted to stay at Amazon, and I couldn't find other folks that would do a great job on it. And so I couldn't stop thinking about it. And so eventually, we thought about, okay, we're gonna just launch an open source Aurora. But then it was also obvious that the the money is in the cloud, and and building a cloud only product is is what creates a lot more focus as opposed to creating a cloud product and and and an on prem product.
So what we did is we kinda claimed our open source real estate by announcing to the world that, hey. This is this code base, and this is open source Aurora or an open source alternative to Aurora. And the code is open, and the code is under a good license. So, you know, anybody can watch it, anybody can adopt it. But we only focused on building a cloud product that consumes this code but delivers this as a service. So that was kind of the second insight, and and that was the plan from the start to to only be a cloud product. The 3rd and the 4th insight came as we started working on that.
I learned that, you know, there's lots of things you can do with the database technology. You can work on mega resiliency for, you know, multi region deployments. You can work on multi master, you know, increase wide throughput because, you know, database kind of bottleneck on on the amount of rights you can you can send through them today. You can work on analytics. And what would be this defining feature of your product vis a vis what you can get off the shelf from Amazon? And then we realized that serverless was kinda a big deal, and that's when we made a decision to delay our release, and then delayed our release by at least 6 months and ship it serverless only.
And that work only is something that I learned competing with Snowflake, where frankly, single store did too much. We have cloud and on prem. We OLTP and analytics, and that prevented us from from, like, truly competing for what became a big category, right, data and analytics in the cloud. So here, I don't wanna make the same mistake. So so we focused and we said, Postgres only, cloud only, serverless only. So that was kind of the the next big insight. Finally, what we're realizing now, is, well, our user is a developer, and developers have lots of needs.
We asked ourselves the question, why people use Nian versus AWS or Azure or GCP architecture? And the answer was always kinda like, oh, it's easy to use. You push a button and you get it. I think the real answer is that small teams that need to move fast don't have the luxury of having DevOps. And if you're using Amazon, you need DevOps. Right? Because Amazon is infrastructure. Amazon is not a developer platform. When you use GitHub, you don't need DevOps. Right? You know, you as a developer consume that feels super native for you as a developer. But when you use e c 2 or, you know, 200 services on AWS, you feel like it's like Lego bricks on which you build your application, and it doesn't feel like this is built for developer as an end user consumption.
So so that moment was like, oh, this is what it means. Smaller teams can move faster because they don't have DevOps, and they consume this this directly. Once that clicked, we're realizing that those teams need more than just a database. And, if you tune in to Nian, you will find that there is more and more technology we'll be shipping, that is more on a database plus plus database plus more that we'll be shipping.
[00:12:02] Tobias Macey:
1 of the interesting points that you brought up is that Aurora had a lot of interesting capabilities and functionality of being able to provide this serverless experience, scale to infinity. You don't have to worry about provisioning the number type of instances. You just throw data in, and it does what it's supposed to do. The problem that I've seen, though, is that it is not an exact actual Postgres or an exact MySQL. There are enough edge cases that if you are using it for anything even remotely nonstandard, you're gonna hit problems, and you can't just use it as a complete drop in replacement for MySQL or Postgres. And so your insight of saying that if you're going to do this right, it has to actually be Postgres through and through, I think, is very salient and very well thought through.
And so given the fact that Postgres is a very large and diverse ecosystem, a lot of different use cases that it's supporting, a massive number of different plug in types. I'm wondering if you can talk to some of the ways that you're thinking about what it means to be serverless for such a diverse ecosystem. What are some of the ways that you're trying to scope the applicability of Neon so that you don't have people coming to you and complaining that, oh, it doesn't do x, y, or z because I'm trying to use these 15 different plug ins and some of the ways that you're orienting towards that developer experience by removing the operational concerns.
[00:13:39] Nikita Shamgunov:
Well, I think there are 2 questions in 1 here. 1 is, how do you maintain compatibility with Postgres in the, where the reality is that the ecosystem is so deep. So what are you changing with Postgres, and what are you not changing, and what are the, net effect of those changes, with regard to compatibility with the ecosystem. And like I said earlier in the call, the compatibility with Postgres is like paramount. And if you break it, you're on an island. Right? At SQL Server, we used to say 99% compatibility means 99% of your customers have problems. So the compatibility needs to be a 100%.
So from the architecture standpoint, and you can look at the architecture of Nian in our documentation, we we don't hide the fact that we actually run Postgres. Right? We run Postgres in ADM, and we attach that Postgres into custom build storage, and that thing we build from scratch. So the integration point with Postgres with our storage goes on a relatively thin API. You know, at the end of the day, post a storage engine, request pages from disk, and then write transaction log record. I'll call wall on disk, and then update use wall to update those pages both on disk and in memory.
And that's precisely where we interject it. So we said, instead of writing a transaction log record on disk, send it over the network into our service, and then instead of requesting a page or reading a page from disk, read it from our service over an API call. And just just as you see, you know, that allows us to actually not change the engine. Now the reality is, well, you still need to change the engine, because, you know, while this looks very, very good on paper, then Deville is in the details, and then there isn't a pluggable storage engine support at postscripts. So we had to do a little bit of surgery. The important bit is the amount of that surgery is not huge, and that allows us to keep the compatibility.
So that's how you attach Postgres to our storage. Now what about the serverless bit? So the way serverless works is we run Postgres in the VM, and, we change the size of that VM and add more memory and CPU to the VM based on the workload and then remove memory CPU from the VM if the workload doesn't require as much memory of CPU. So also on paper, that that sounds wonderful and easy to do. Well, turned out that there's a lot going there. So so we had to build a lot of the VM expertise internally at NEM to support that. We also thought about running compute nodes and containers.
Well, they're not really isolated, and those risks can be hacked, or broken out of the process, which is not ideal. So we needed that security boundary. In addition to that, we wanted our VMs to be able to to change hosts. And Postgres is a stateful system while state lives in our most storage, but even the connection to Postgres is stateful. You interact with Postgres by establishing a TCP connection. So if you move your your container from 1 VM to another, you break that connection. VMs, you can actually log migrate, and even the TCP connection remains. So that was kind of the the second reason for us to use VMs. And now there's a ton of VM expertise because we, you know, run 100 of thousands of them all at the same time on on our platform.
So, basically, the answer, how do you deal with the, you know, massive ecosystem of Postgres is, well, through that architecture, we don't break out compatibility because the engine itself is still Postgres. It's just for swapping out storage, from under Postgres, and the API to storage is so small, that it doesn't impact app compatibility. So developers don't suffer, but do they thrive? That's another question. So and 1 of the things that developers need to thrive Well, some of this stuff is silly. You you know, you go and, launch an RDS instance. It's not connected to the Internet.
And if you run CloudFlare Workers, it's a gigantic pain to connect your application to the database. You know, certain things don't support TCP connections, so that's why we launched our serverless driver. Postgres doesn't do very well with lots of connections, and therefore, there are systems like poolers, our PG balancer that allows you to scale number connections to Postgres. So part of the value is just packaging all of that and make it as stupid simple to consume, never run out of connect connections, and never do operations that you have to do without the systems adding infrastructure to just the core database.
But then there's more than that. Every application that a modern small team does. Right? Bigger teams, they have DevOps teams, SREs. They they stand up their own CICD pipelines. But if you're taking things off the shelf, if you're taking GitHub, if you're storing your front end on resell, if you're running your software development life cycle by by sending PRs and and run tests and GitHub actions, Turns out database doesn't play nice in in that, and we made it play nice. So we have database previews, which is achieved through the technology with whole branching. We have the ability to create those previews based on every PR in GitHub, and now we're adding more and more features that would integrate with a deeper JavaScript ecosystem. So when you build apps and you need things systems like auth or payments or storage, that's also trivial to do on the end.
So all of that kind of falls under the umbrella that you wanna ship your applications faster. That's really the whole acceleration movement, which is, you know, mostly driven by AI. But really, by developer productivity, you can crank those apps much faster now, and for that infrastructure needs to support them. That that all of that contributed to the vision of of that we have at Nian.
[00:20:17] Tobias Macey:
In terms of the engineering that you had to do on Postgres, as you said, Postgres is known for being very pluggable, but the storage engine, at least to date, has not been 1 of those plug in interfaces, though my understanding is that that is changing. Wondering how you have had to approach the rework of that Postgres engine to minimize the footprint of your changes while maximizing the capabilities that you're enabling and some of the ways that the scope and goals of your work on Postgres and Neon have changed from when you first came up with this vision of what you wanted to build to where you are today, where you have a real world production system that people are using every day?
[00:21:08] Nikita Shamgunov:
That's honestly, it's not the hardest part. The the specific work that's happening on the Postgres engine is whatever we can push into the extension, we push them to the extension, and then the rest we we forked. Right? And the way we forked it is we know that this is gonna make it into the core product, either in in this form or once the pluggable storage engines will be introduced. And the amount of changes that that that we need in the core engine are so small that it's trivial to merge them as the new version of Postgres shows up. So I don't know if it's gonna be Postgres 18 or 19, but by that time, I don't think we're gonna have any differences between Postgres 19 and Postgres that we run on the platform, and all of that will go into the pluggable storage or extension API.
I think the more interesting question is, like, where do we spend time? Like, where where does where's the innovation? And the innovation is at the e at the both at the bottom, what what lives from under Postgres that, you know, enormous amount of work that we did by building our, storage subsystem that is fully elastic, multitenant, integrates with s 3. We can run it globally, around the world. That's kinda a marvel. The size of that project is similar to that of, like, maybe pure storage. It would be a comparative, for us, except for Pure Storage and Appliance and and and we are cloud service. And then, another piece of work is above the database.
So not only we made it serverless, what via that VM technology, but we also put a very nice developer veneer on top. And that goes into, you know, 1st, it's serverless, 2nd, consumable from HTTP. Now it's pluggable into, like, Next JS, all this, like, modern JavaScript framework, supports software development life cycle, integrates with Vercel. There's, like, thousands of people using our Vercel integration and and and you connecting database previews with Vercel previews and the stuff that is coming down the pipe, around authentication, around storage, around payments, kinda like more of a backend as a service platform, not just the database.
And I think that's that's the right direction for us. While the database itself is very, very valuable, I think fundamentally, we're delivering in a shorter cycle speed to developers. And our slogan is ship faster with Postgres, so that's why we we have to take over more of the app to allow our users to ship faster, and that's where a lot of innovation is coming into.
[00:24:03] Tobias Macey:
On that note of the branching capabilities, obviously, that maps more closely to the ways that developers think about developing and deploying and debugging features. How has that capability being pushed into the database changed the way that development teams approach their iteration cycles and the ways that they think about actually managing their workflows and debugging capabilities?
[00:24:31] Nikita Shamgunov:
I think it all starts with a specific pain. So let's start with the pain. What's the pain? Today, if you run a production system and you wanna send up a staging environment, you need to move data from the production system into the staging environment. I'm not even talking about how fresh this data is. I'm just talking about give me a snapshot as of, I don't know, yesterday or today, a few hours ago. Give me the snapshot of that data in my staging environment. Well, turned out, for whichever reason, it's a hard pull. Right? It's not that easy to do. Now I have my staging environment, but I'm I'm sharing that staging environment with my whole team. Let's say my team has tens of people, maybe hundreds of people.
They all need a staging environment and they all change in schema because they're building the app. Now they conflicting on both their resources, but then the other centralized resource is just the state of that database. So you can't have a but if you wanna have a staging environment on the per developer, and God forbid, they also test in performance, now you you have hundreds of copies of that. Not only hard to manage, it's also inefficient from the cost standpoint. Now imagine an alternative thing. Let's just say it's trivial to create a staging environment by creating a branch of the production environment. From there, you may or may not, small teams don't, but larger teams definitely do mask or override all the PII data. But then once you have that staging environment, how can you have developers, you know, create developer environments without breaking the bank? So each developer environment shouldn't cost you very much.
And then but still allow you to run performance tests if you want. And then the other 1 is how do you as you develop features and they all conflict on the database schema, how do you make sure that you resolve those conflicts? And this whole thing plugs into your CICD pipeline. The the fundamental primitive that we have is database previews, which we call branches today, but we actually change in that language. We're gonna call it previews everywhere. And when you create a preview, it gives you a full copy of your data, data and schema, and then it's isolated. So for for developer, it's yours.
Underneath, storage does this smart copy and write thing, where creating a copy, is 0. Right? So it's very quick. And then compute is just separate. Right? So it's a different VM that runs Postgres, and that's your compute. So that's the definition of separation of storage and compute and taking advantage of that architecture here. Now in your developer, you can do whatever you want. You can change data. You can change schema. You can test performance. You can drop indexes, gradients, whatever. But then you want to roll these things forward into first stage and environment, and then eventually production environment. What we've discovered is that people don't really care about the changes in data. As a matter of fact, the the data changes in the dev environment should not propagate it all the way to the production, But the application depends on the schema, so schema has to to migrate forward. There are lots of tools that help you with schema migrations.
Those are called ORMs. Things like Prisma, like Drizzle, Type ORM, and whatnot. And we're just plugging in into that workflow. So so we we're thinking very hard both of what is our place on the sun and what we should, you know, grab from the ecosystem and be orthogonal to to that. So ORMs run migrations within the context of of 1 database, but, certainly, they're it's not in their power to generate database previews and give you this fancy forking capability. So that's on us. But then we package it all such that it's trivial to set up the, the software development pipeline and and and life cycle, where for every feature, you go to staging, create your development branch. If you wanna create a sing, 1 every single time, you can just refresh your dev branch from whatever is the current thing in staging. Maybe you wanna mid staging, just do it, directly from production, which is totally fine as well. Develop your feature, send the PR, and from that point on, we got it. So that really speeds up the cycle, to be honest, and, we kinda super excited to see our customers, taking full advantage of of that technology.
[00:29:01] Tobias Macey:
With the separation of compute and storage, you're creating another hop for that data to flow. And I'm wondering some of the ways that you are thinking about the impact on latency, the impact on reliability, and the ways that you're engineering around that problem to get the best of a fully integrated stack of Postgres where it's all running in 1 unit. But the scalability and in in terms of both compute and storage and pricing of being able to actually separate those tiers and the additional layers that you've had to work in to be able to mitigate those latency or performance impacts?
[00:29:38] Nikita Shamgunov:
This is a very fair question. The important thing to understand is that if you run a highly available environment with your classical deployment when you have 2 or 3 nodes, there is a network hop there anyway. So when you ride a when you ride into, you know, the master node to the primary node, the transaction is then over a network hop sent to a a replica. And those are, quote, quote, synchronous replicas. So that write needs to be acknowledged by the replica, and only then you can acknowledge the transaction that you sent to the primary. So there is an, in highly available environment, the hop is already there. If you run Postgres on an EBS node, well, EBS node is network attached as well. So we're we're not really actually, we do, but, like, at the high level, it's roughly the same number of hops. While the reality, there's a Paxos protocol that we use for reliability when we send the log record into our service that's called Safekeepers.
So that have multiple hops to to persist the the record in the access protocol. But it's not like you can avoid network hops altogether in some other architecture you can't. The latency is fundamentally becoming the latencies and throughput are becoming roughly the same. And roughly, there's still a bit of a haircut that we're taking on on latencies. But in return, we're giving you infinite IO throughput. Right? Because our source is multi tenant, and, you know, we can request as many pages as you want. So that's the trade off, and specifically works super well for much larger databases. And for small databases, performance usually is not a problem. So that's the answer for for the question of, so what do you how do you deal with a network cop? Are you strictly worse? And the answer is, well, not really. You have those network cops anyway. Another aspect of postgres from an operational perspective that anybody who has run it for long enough has gotten bitten by is the upgrade process where you have to
[00:31:38] Tobias Macey:
deploy the new node, but you have to keep the old version around to be able to do the upgrade of the storage engine, and it's always this complicated dance. And I'm wondering how you're thinking about removing that pain for the end user and some of the ways that you, as a platform operator, are addressing the automation and scalability of that upgrade cycle?
[00:32:00] Nikita Shamgunov:
As coming from SQL Server, and it's been 15 years since I left this SQL Server, the, the fact that it doesn't have online upgrades bewilders me. And the way that the upgrade process is set up in Vanilla Postgres is frankly strange. SQL Server, you just restart. You're like, you know, shut down the old binary, start a new binary, point to the data, location, and then it just upgrades on the spot in place. And, the SQL Server team makes sure that the upgrade upgrades never fail. They just, like they kinda guarantee that this is the case. Here, you have to do a bunch of dance to upgrade a Pogo instance, but we just treat it as a feature. By the way, we don't have that feature yet. But this feature is under development. You know? It's not difficult, but think about it. You know, we run a cloud service. There is a playbook of how to upgrade Vanilla Postgres. We apply in that that playbook for for our instances. It's trivial to us to stand up a particular version, of Postgres, in in that micro VM that attaches to storage. Of course, we need to do a bunch of manipulations so that storage is in the right format, so you can attach the next version. Yeah. It's a feature. We'll build it. It's not there yet.
[00:33:14] Tobias Macey:
And also from the fact that you are focusing on the developer community, how much does version factor into their end user experience of, oh, I wanna run Postgres. Is it okay? Well, which version do you want? Or is it just, okay, here's the latest, and we'll make sure you stay on the latest? We debated that. We let people choose the version, the post test version
[00:33:35] Nikita Shamgunov:
today. I was actually advocating to not. I was saying let's just run the latest version and upgrade ourselves, but then we didn't have the upgrade feature for a while, and we still don't have it. It's coming. So we landed somewhere in between. So when the new Postgres version shows up, the default, Postgres that we spin up is the latest version. We don't upgrade automatically, and we'll let people choose up to 2 2 versions back. And so far, the architecture of our storage allows us to do that. Again, it's a testament to kind of the the level where we plugged it. So we plugged in at the page level and pages don't care about the version. So so that all works. I think there are benefits to just being on the latest version. I just lost that argument when we were introducing that feature, but we haven't been bitten by that much. And we haven't been bitten by this much is because Postgres is fairly disciplined and and regimented in how it releases. It releases once a year.
Not that much stuff changes. Developers, for the most part, don't care as much about this being we're in this version or that version. You know, every now and then, some good developer features like JSON showed up and, you know, developers care about those. Otherwise, they just like they just it's like Linux. Right? Linux kinda works, you know, this version or that version, only the operator really cares, But the the end user doesn't care as much.
[00:35:03] Tobias Macey:
On that note of developer focused features, the topic that has sucked all the oxygen out of the room for everything else in tech is AI and generative models. Commensurate with that is the rise of vector databases. Postgres has the PG vector extension. As somebody who is running a platform as a service for Postgres, what are some of the ways that you're thinking about the utility, the messaging around, and the impact on your business of PGVector and the ways that it incorporates with the Postgres ecosystem?
[00:35:38] Nikita Shamgunov:
Oh, it's been huge. We're actually contributing to PGVector. There's a nice story there. Keiki, I think, is the number 2 contributor. Still much smaller than the creator of pgevector, Andrew Kane, but nonetheless number 2 contributor. We found a way to improve PG vector a year ago, and, we realized that there is, you know, this index polish NSW. We thought it should be in p g vector. We didn't have a way to contribute to it, and then we built an alternative extension to VPG embedding that demonstrated material improvements for the IVF flat implementation of the index that pj vector had while still has it. Now you can choose. Once we showed the science, Andrew started to work on HNSW and, introduced HNSW PG vector. He's done a great job, and that basically, prompted us to retire PG embedding. And we took all our knowledge that we've collected by building PG embedding.
And what was applicable, we contributed back to PG vector. So, so that was our experience. From the, you know, business perspective, oh, it's wonderful. It's wonderful that this thing is there. We obviously support it. We're contributors. We do things also. Our architecture makes it better to run PGA vector on Nian versus, other platforms. And specifically, when Nian builds an index, and this is a very heavy compute operations because, you know, you do this a lot of the spectrum math as you, you know, create that index. So super compute heavy, super memory heavy as well. Neon can temporarily give you more compute and memory on demand and then shrink it back down. So you don't need to commit to very large instances ahead of time when you when you use DG Vector. So so that was great.
People build AI apps. Each AI app needs stuff. Right? 1 of the biggest things, what makes an AI app an AI app, you'd talk to an LOM. Well, that's not us. But when you build a rack application, you do need a vector database, and then the rest of the plate, of the application chooses Postgres anyway. So so that's what we do with Neon, and so far, that's been working great for us. I think there's gonna be more and more demands, and especially as we add more developer features to the platform in addition to just the database, we'll see more demand of having AI relevant features.
[00:38:08] Tobias Macey:
We have a bunch in the pipeline, and and we'll be announcing them kinda soon. Another element that we've touched on throughout is the fact that everything you're building is being released as open source and permissively licensed. I'm wondering how you think about the relationship between the open source code and your business model and the overall sustainability of both.
[00:38:30] Nikita Shamgunov:
So we're exposed to hyperscalers. I don't think we're exposed to anybody else. So in order for you to run a service like Nian, you need to have several pieces of the expertise. 1 is, well, you need to understand what's written in the code. It's very scary to run somebody else's systems code. And if for an operational database, if there is a bug, you need to fix the bug. So you need to build that expertise. You also need to, you know, stand it up, set up all the absorbability, upgrade systems, like, basically, it's like set up processes that that allow you to run it well, and then you need to build a a team of committers that touches every part of the stock.
So for a start up, next to impossible. For a large company like Amazon, Microsoft, Google, it is possible. It's also possible to build this whole thing from scratch for, frankly, for Amazon, Microsoft, and Google, and Amazon has already done it with Aurora. Microsoft is a little behind, and that's why we are actually partnering with Microsoft. And then, you know, tune in to some announcements. And then Google has a project called AlloyDB, which, I think just like behind me with with the stuff that we can do. So I don't know. It is possible to to, like, quote unquote steal it, but only a handful of companies actually can.
In the US, it's Amazon, Google, and Microsoft, We're partnering with Microsoft, and Google and Amazon has already have already done it. So I think we're good.
[00:40:09] Tobias Macey:
As you mentioned, your previous company was another database company. I'm wondering what are the lessons that you learned in the process of building and growing single store previously known as MemSQL that have been most useful to you in the work that you're doing on Neon?
[00:40:27] Nikita Shamgunov:
The first 1 is focus. Right? We did too much at single store. You know, it's north of a $100, 000, 000 run rate business. So, like, we we didn't fail, but we didn't take it public yet, and it's it's been some time. So and if you kinda zoom on the on the reason is we did too much. We were on prem when we were in the cloud. We're supporting operational workloads and analytical workloads, and the the problems were different in each 1 and shared nothing architecture, either both for analytics and and, and operational workloads breaks compatibility with the the mothership. For us, the mothership was MySQL.
We didn't use MySQL code because it's GPL, but we, you know, use MySQL protocol and syntax, and then turned out all the subtle bugs and the compatibility is something that I have a lot of scars on. So we we certainly fixing that in with with Nian. We're we're not breaking compatibility. And then on the analytics side, well, cloud and object stores was something that we ignored for a while and then eventually caught up, but that was kinda too late. So I think the big 1 is focus and then driving very, very hard towards becoming the default. And maybe that will take some time, but for the outside observers and then later for customers and partners, it's very, very clear where you're going and where we want and we want to become the default development platform for Postgres, and therefore, our architecture, marketing.
So if you do all the top line right, your your technology is very, very solid. Your positioning is very, very solid. Your developer experience is solid. Your design is solid. Bottom line, what kinda follows, and that's our intention. So I I think at single store, we we had a lot well, I personally had a lot more energy. I lived in the office and slept next to the servers, but, I I didn't I lacked that maturity and focus, which, I'm I'm bringing it here at Neon.
[00:42:43] Tobias Macey:
In the work that you're doing on Neon and the ways that you're seeing people use it for their own use cases, what are some of the most interesting or innovative or unexpected ways that you're seeing Neon used?
[00:42:54] Nikita Shamgunov:
The stuff that we didn't expect is, and people do it a lot, is to using, 1 1 database, 1 instance per tenant. They're like, well, they're kinda cheap. They stand up in 22 100 milliseconds. I'm just gonna run a full blown database servers for full postgres for 1 user. And we didn't expect that. Now there's, like, companies, they they run fleets of, of instances, and it works actually really well if you have uneven consumption on a per client basis. Have a long tail of customers, they barely use it. Okay. Well, you're basically paying 0 for those with Nina. And then some using quite a bit, and for that, you need elastic compute.
So it allows you to kinda right size your usage very well. The second thing that we didn't expect at all, is that people run what if scenarios with our branching capability. They're like, okay. Every customer that it's exploring and that there's specific financial planning product, they're, exploring the impact of certain changes. Oh, well, we'll create a branch and then go hog wild on the branch, change data, you know, reads, writes, whatever, and then compute the final result that gives you an answer if what if scenario is successful or not. And then successful, you proceed. If not, you throw out the branch. That was another unexpected thing that I I I didn't even know that scenario existed. The rise of our serverless driver was another surprise to me. It turned out JavaScript developers don't know what a connection is and what a socket is, and I think it's great, actually.
Like, nobody needs to know. So so, you know, JavaScript engineers consume me and using our serverless driver that allows you to to, you know, query me in over HTTP. That was another interesting surprise. What else? Yeah. That those are probably the 3 most most interesting ones. Pg vector caught us by surprise as well Very quickly as we launched EMP, we were like, do you have PG vector? And and so that became Con X standard.
[00:45:10] Tobias Macey:
And in your experience of building the Neon product, scaling the business, what are the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:45:20] Nikita Shamgunov:
I think we'll learn to automate a lot here. So we we actually don't scale anything with people. We're scaling everything is scaled by technology. So for example, there's a feed on my Slack for new customers upgrades. You know, people start with the free tier. When they upgrade, I get a notification on the feed. Everything is there's telemetry on everything. And then there's a data team that is very busy, producing that that works for us. We we make the decisions based on data, and we think about the world as kinda like this real estate. And we know we have good products, so we just need to claim that real estate in the world and then measure everything.
So that that I think was also a difference between operationally running Nien versus running single store. Also, Nien is fully remote that allows us to tap into, unbelievable talent around the world at the cost of the communication overhead. So I still don't know which 1 I like better. It's very fun to go to offsites and then, like, meet the team and then all these people who who do incredible work and then seeing them in person. And at the same time, you know, you do pay the communication overhead for people working from home. So, yeah, those are those are the learnings.
[00:46:51] Tobias Macey:
As you were talking about the data that you're collecting, the dashboards that you're building, obviously, to run a business, you need a database. It makes sense that you would use Neon internally as well. What are the cases where you're actually not using Neon and you need to turn to a different data engine?
[00:47:08] Nikita Shamgunov:
We have 2 more data engines at least, maybe more. 1 is powering our Grafana dashboards. And, for observability, you need an observability engine. And then we use Snowflake for all the reporting. We're we are at 60 terabytes of data in Snowflake, and this number is shocking me. Like, how come tiny Neon generated so much data? But, you know, we are where we are. Post's just not good for for for that. It's interesting that there's more and more work going into Postgres to support data and analytics scenario. It's gonna be a while until it's gonna be like a full fledged data warehouse.
But think about it. Postgres is like Linux. It's a commoditization for us. And a vectorized column store query processor is the future at the end of the day. You know, SQL Server and Oracle have those. So we'll, we'll have them in posters too, and that's coming. And integration with the data lakes is coming as well. There's already plugins that people discuss on Hacker News that that provide such functionality that allow you to create parquet. And in the future, you know, it's gonna be iceberg integration, Delta parquet integration. So all of that has come into the platform.
Where it stands today though for analytics, it's good for small scale. It's not very good for our scale. And for that, we we use Snowflake.
[00:48:32] Tobias Macey:
The cases where Postgres is not the right choice is something that many people have already discussed in various contexts. But for the case where Postgres is the right choice, when is Neon not the right way to run Postgres?
[00:48:47] Nikita Shamgunov:
There are, well, there is the meme walking or going around where I just use Postgres. And I think it's it's great for us. I think it's great for the industry. Again, I think the the era of lots and lots of database engines, that are built for purpose is coming to an end. And you certainly still need, you know, a data warehouse or a data lake, and history will tell if actually all you need is a data lake you need a data warehouse as data lake. We'll see. And then you need an operational database, so that's process. Then there's, like, all the other things that that you potentially need as well.
I think over time, they will all go 1 way or another. Meaning, they will either be part of the data warehouse or they will be part of an operational database. If operational databases and and analytical databases is gonna be 1, that I don't know. I tried, with single store, and again, we scaled this past close to $100, 000, 000 in run rate. I don't think we we had enough of an industry impact to say you just need 1. Maybe. Maybe 1 day. But I still don't know even, you know, after 12 years of a single store, I don't know if it's 10 years out or 20 years out. It's certainly not 2, 3, 4 years out because not only you you need to build technology, you need to change how people build software, and that's like a toll order.
So I would say don't use Postgres for large scale analytics today, and don't use, a data warehouse to power your operational apps for OLTP. In between, you can decide, you know, which way it goes. You have something in between. Everything else kinda will will will be pulled in into 1 of those 2.
[00:50:43] Tobias Macey:
And as you continue to invest in Neon, what are some of the things you have planned for the near to medium term or any projects or problem areas you're excited to explore further?
[00:50:54] Nikita Shamgunov:
Oh, there's a ton. So more clouds for sure. So I think we want to be the default post with software in everywhere in the world, and, we're gonna launch another cloud this year. So I'm super excited about that. That's 1. We're gonna add more developer features. We're gonna have we're gonna make it much easier to build off payments and storage and manage off payments and storage with Nian. And we'll do it with with some partners. So super excited about that as well. We're gonna launch your GitHub app, which will make a much tighter integration for you with CICD and, automatic creating of previews.
We're gonna start bridging some of this, not replacing, but integrating me in with the data lakes. So, that's another super exciting part. And then we're gonna launch more platforms, meaning it's not our platforms, but other platforms that use Neon as the default database provider. So, yeah, that's a lot. So so I'm excited to pull all all of that off and then launch well with high production quality. So I'm looking forward to all that.
[00:52:07] Tobias Macey:
Are there any other aspects of the Neon project and the business that you're building around it or the Postgres ecosystem that we didn't discuss yet that you would like to cover before we close out the show?
[00:52:18] Nikita Shamgunov:
1 thing that is not obvious to everybody is how actually a few people moved the Postgres project forward. And there are certain amount of aging that's happening in the core contributors to Postgres. So I think what would be very useful for the industry, not just for us, and we are contributing in a small way. We have a Postgres team, and Neon engineers contributing to the the core Postgres project even in places where not kinda obvious how that benefits Neon outside of just, like, well, post this gets better. So we do some of that work, and Hickey continues some of that work, and he would use patches.
Right? The industry should train more people who are Postgres kernel engineers because of that aging problem. And the the absolutely top contributors to the Postgres engine are now in their fifties and sixties. So would be nice if more systems engineers from around the world, younger system engineers started to contribute to Postgres. This is called for engineers and also called for the industry to sponsor this work. And the best way to sponsor this work is if you have a high dependency on Postgres, if you you're running lots of Postgres instances in production, it's not that expensive for for the big business to to have some of those engineers contribute to the post risk kernel.
[00:53:48] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:54:06] Nikita Shamgunov:
Well, can I talk about AI?
[00:54:09] Tobias Macey:
Absolutely. I I think
[00:54:12] Nikita Shamgunov:
well, we stood up a data team at Nian, and even for a small company like ours, it wasn't easy. I think there should be an AI data engineer that, can kinda, like, put it all together for you and describing that problem in a, very broad way because I don't wanna lead to the answer. Right? This is definitely not text to SQL. Text to SQL is like a tiny piece of that problem. The problem is I don't have a data practice. I stand up this thing, and that thing figures out and stands up the data practice for me, and acts like a human, and that's tricky. But I think it's possible because now we see those. These systems like Devon from Cognition Labs, you see all this, like, AI engineer, TypeWork. We see magic dot dev, like and and stuff that people are showing is quite magical. So you're like, okay. Well, it's coming.
I think that's 1 of the things that the the data management is missing. You can go further. You can say, well, a data warehouse is a gigantic calculator. Right? And, it's a gigantic calculator, but in order to take advantage of that calculator, you need to, like, really organize data. Put it into tables, columns, obsess about the schema, understand this has a semantic meaning to it. But imagine a gigantic brain that you can just, like, shop data in, and that thing makes sense of that data, and then answers business questions. So I don't know what this means to the future of data warehouses, investing existed.
Now again, I'm, like, thinking a little bit far ahead on this, but if we dream a little bit, then we we may find unusually different architectures for for data and analytics that are fully AI driven. And not just AI on top of a data warehouse, but maybe changing the the architecture of the the whole data warehouse. But, you know, we'll see.
[00:56:28] Tobias Macey:
It's definitely a very interesting future that I'll be excited to see how it develops. So thank you very much for taking the time today to join me, share the work that you and your team are putting into the Neon project. It's a very exciting project or very exciting product, so I'm excited to see the ways that it continues to develop. Thank you again for all the time and effort that you're all putting into that, and I hope you enjoy the rest of your day.
[00:56:54] Nikita Shamgunov:
100%. Thank you so much.
[00:57:02] Tobias Macey:
Thank you for listening. Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com. Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Episode Overview
Guest Introduction: Nikita Shamkhadev
The Neon Project: Overview and Genesis
Building a Serverless Postgres
Focusing on Developers and Serverless Architecture
Developer Experience and Database Previews
Separation of Compute and Storage
Handling Postgres Upgrades
AI and Vector Databases
Open Source and Business Model
Lessons from SingleStore
Scaling Neon and Data Management
Future Plans for Neon
Postgres Ecosystem and Community
Biggest Gaps in Data Management Tools
Closing Remarks