Neon: A Serverless And Developer Friendly Postgres

Hello, and welcome to the Data Engineering podcast, the show about modern data management.

Data lakes are notoriously complex.

For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end to end data lake has platform built on Trino, the query engine Apache Iceberg was designed for.

Starburst has complete support for all table formats, including Apache Iceberg, Hive, and Delta Lake.

And Starburst is trusted by teams of all sizes, including Comcast and DoorDash.

Want to see Starburst in action? Go to data engineering podcast.com/starburst

today and get $500 in credits to try Starburst Galaxy, the easiest and fastest way to get started using Trino.

Your host is Tobias Macy, and today I'm interviewing Nikita Shamkhadev about his work on making Postgres a serverless database at NEON. So, Nikita, can you start by introducing yourself?

Great being here. My name is Nikita. I'm working,

like you said, on on serverless Postgres.

It's been a very fun journey.

We started

3 years ago and changed on March 1, 2021

with 3 guys in a slide deck.

And now Neon is a company with a 100 people and hundreds of thousands of databases on management.

And do you remember how you first got started working in data? Yes.

I think I always was fascinated

with databases.

I started to use PHP MySQL,

while,

still going to college because I was kinda moonlighting and and trying to make, a little bit of money while while studying computer science

back home in Russia.

And then my first

real job, like, real real job was at SQL Server,

which is a flagship database product at Microsoft.

Before I joined there, I had no idea what how large scale systems are being built. And I really cut my teeth on database architecture,

fundamentals,

how to get quality

at SQL Server. So that was my first intro.

Now

in terms of the Neon project, can you give a bit of an overview about what it is that you're building and some of the story behind how it came to be and why you decided that you want to spend your time and focus on it?

I actually thought about Neon

while working on SingleStore, my previous company.

And SingleStore was

designed

to be a scale out system,

and we had a very ambitious vision of becoming the the system that can run both transactional

workloads and analytical workloads and all

kinda globally

distributed system. Lots of learnings, certainly on,

on that project.

And then at some point, I I was seeing the rise of Postgres because every single customer or single store, and we had very large customers,

had Postgres somewhere.

Right? They would put

some, like, really large scale workload usually either an analytical workload or a real time analytics workload on single store,

and a part of the data feed that was going into that would always come from Postgres.

And so I was just seeing Postgres everywhere.

Also at the same time,

my

prior mentor

at SQL Server, whose name is Alex Rubitsky,

was telling me about this exciting project that he was working on at Inside AWS.

That project eventually launched and became AWS Aurora.

And I saw,

a very, very fast growth of adoption of Aurora.

And with single store, every time we would go into, like, application migrations,

that was frankly a shit show. Right? Because you have an application built against 1 engine,

and then you're trying to, you know, move the application from 1 engine to another.

Turns out there's all these little quirks

that prevent you from moving the application over.

And then when, you know, Aurora came out and I read about the architecture,

I was like, wow.

That's an interesting proposition.

It you don't lose the surface area at all. The surface area is exactly the same of the database product, but you have this additional benefits

that you get through the separation of storage and compute.

So and I started thinking about it, and I couldn't stop thinking about it. It. That was that was an interesting

artifact

of me. Obviously, I was running your company and I didn't have that much time to to build a side project there,

and it didn't make sense to do it inside single store, but I just couldn't stop thinking about it.

And I spent many years in that idea maze where I was like, well, if I were to build a competitor to Aurora, what am I gonna do exactly?

How can I take advantage of open source? How can you take advantage of

cloud distributions?

What does it mean to be a developer versus infrastructure offering?

When I left SingleStore, I joined Khosla Ventures.

By that time, the idea was more or less formed in my head

up to a point, enough of a point to start a company,

and certainly not enough to, you know, fully plot the future path of how this is gonna be successful.

And and walking in, I told Vinod Khosla, who runs Khosla Ventures and the founder of Khosla Ventures,

that

I have this idea in me. What do you think? Maybe we can prototype it.

And, Vinod says, yeah. For sure. You know, we we absolutely need to incubate it. How much money do we need? And I said 10, 000, 000, and he said, here is 5.

So

we got to work. And in this company, we engineered

the team.

1 thing you'll learn a bunch of stuff in in venture as well.

And 1 of the things Vinod always says is the team you build is the company you build.

And in a way, I started to think

about Nian

not in terms of the plan,

but in terms of the team. Who do you have

on the team? Who do you need on the team

defines what actually the plan is gonna be and what kinda what kinda product this is gonna become.

And so this company was engineered around people who are very post trust native

because I knew for sure. I knew 100%

that if you diverge and you're not Postgres anymore, you're something else. You know, you go buy PlanetScale,

CockroachDB,

whatever that might be. Doesn't matter if you speak Postgres protocol.

You either Postgres or you're not.

And so this company was built around people who contribute to the core Postgres engine. KKL in a congress as a Postgres Committer,

and Stas, was a Postgres commune contributor.

So that that became the foundation

of, of NEM.

Then

in the beginning,

the only insight was that every successful cloud product

inside a hyperscaler has an open source alternative.

And lots and lots of examples of that. Right, you know, for Redshift, that would probably be ClickHouse now these days. That's 1 of the more or dot DB, 1 of the more popular open source products.

But then,

also Redshift had an alternative, a cloud native alternative, which is Snowflake. And Snowflake frankly out executed

Redshift.

So unbundling a popular database service

seems like a good idea. And I was also thinking about GitHub versus GitLab analogy,

where GitHub is a cloud product and GitLab is an open source product.

And when you're an open source product that allows you, gives you the right to exist.

And I was like, nobody is building an alternative to Aurora.

And and and that was strange to me. I even reached out to Alex and said, why don't we Alex Rubitsky,

who, you know, in the 1st few years wrote the most code on that project, is

like, why don't we incubate this thing together? But I always wanted to stay at Amazon,

and I couldn't find other folks that would do a great job on it. And so

I couldn't stop thinking about it. And so eventually, we thought about, okay, we're gonna just launch an open source Aurora.

But then it was also obvious that

the the money is in the cloud,

and and building a cloud only product

is

is what creates a lot more focus

as opposed to creating a cloud product and and and an on prem product.

So what we did is we kinda claimed our open source real estate

by announcing to the world that, hey. This is this code base, and this is open source Aurora

or an open source alternative to Aurora. And the code is open, and the code is under a good license.

So, you know, anybody can watch it, anybody can adopt it.

But we only focused on

building a cloud product that consumes this code but delivers this as a service.

So that was kind of the second insight, and and that was the plan from the start to to only be a cloud product.

The 3rd and the 4th insight

came as we started working on that.

I learned that,

you know, there's lots of things you can do with the database technology.

You can work on

mega resiliency for,

you know, multi region deployments.

You can work on multi master, you know, increase wide throughput because, you know, database kind of bottleneck on on the amount of rights you can you can send through them today.

You can work on analytics.

And what would be this defining feature of your product vis a vis what you can get off the shelf from Amazon?

And then we realized that serverless was kinda a big deal,

and that's when we made a decision

to delay our release, and then delayed our release by at least 6 months

and ship it serverless only.

And that work only is something that I learned competing with Snowflake, where frankly, single store

did too much. We have cloud and on prem. We

OLTP and analytics,

and that prevented us from from, like, truly competing for what became a big category, right, data and analytics in the cloud.

So here, I don't wanna make the same mistake.

So so we focused and we said, Postgres only, cloud only,

serverless only.

So that was kind of the the next big insight.

Finally,

what we're realizing now,

is,

well, our user is a developer,

and developers have lots of needs.

We asked ourselves the question, why people use Nian versus

AWS or Azure or GCP architecture?

And the answer was always kinda like, oh, it's easy to use. You push a button and you get it.

I think the real answer is that small teams that need to move fast

don't have the luxury of having DevOps.

And if you're using Amazon, you need DevOps. Right? Because Amazon is infrastructure. Amazon is not a developer platform.

When you use GitHub, you don't need DevOps. Right? You know, you as a developer consume that feels super native for you as a developer.

But when you use e c 2 or, you know,

200 services on AWS,

you feel like

it's like Lego bricks on which you build your application,

and it doesn't feel like this is built for developer

as an end user consumption.

So

so that moment was like,

oh, this is what it means. Smaller teams can move faster

because they don't have DevOps,

and they consume this this directly.

Once that clicked, we're realizing that those teams need more than just a database.

And,

if you tune

in to Nian, you will find that there is more and more technology we'll be shipping,

that is more on

a database plus plus database plus more that we'll be shipping.

1 of the interesting points that you brought up is that

Aurora

had a lot of interesting capabilities

and functionality of being able to provide this serverless experience, scale to infinity. You don't have to worry about provisioning the number type of instances.

You just throw data in, and it does what it's supposed to do.

The problem that I've seen, though, is that it is not an exact

actual Postgres or an exact MySQL. There are enough edge cases that

if you are using it for anything even remotely nonstandard, you're gonna hit problems, and you can't just use it as a complete drop in replacement for MySQL or Postgres.

And so your insight of saying that

if you're going to do this right, it has to actually be Postgres through and through,

I think, is very salient and

very well thought through.

And so

given the fact that Postgres is a very large and diverse ecosystem,

a lot of different

use cases that it's supporting, a massive number of different plug in types.

I'm wondering if you can talk to some of the ways that you're thinking about

what it means to be serverless

for such a diverse ecosystem.

What are some of the ways that you're trying to scope the applicability

of Neon

so that you don't have people coming to you and complaining that, oh, it doesn't do x, y, or z because I'm trying to use these 15 different plug ins

and some of the ways that you're orienting towards that developer experience by removing the operational concerns.

Well, I think

there are 2 questions in 1 here. 1 is,

how do you maintain compatibility

with Postgres

in the,

where the reality is that the ecosystem is so deep.

So what are you changing with Postgres, and what are you not changing,

and what are the,

net effect of those changes,

with regard to compatibility with the ecosystem.

And like I said earlier in the call,

the compatibility

with Postgres is like paramount.

And if you break it, you're on an island. Right?

At SQL Server, we used to say

99% compatibility means 99% of your customers have problems.

So the compatibility needs to be a 100%.

So

from the architecture standpoint,

and you can look at the architecture of Nian

in our documentation,

we we don't hide the fact that we actually run Postgres. Right? We run Postgres in ADM,

and we attach that Postgres into

custom build storage,

and that thing we build from scratch.

So the integration point with Postgres with our storage

goes on a relatively thin API.

You know, at the end of the day,

post a storage engine, request pages from disk, and then

write transaction log record. I'll call wall

on disk,

and then update use wall to update those pages both on disk and in memory.

And that's precisely where we interject it.

So we

said, instead of writing a transaction log record on disk, send it over the network into our service,

and then instead of requesting a page or reading a page from disk,

read it from our service over an API call.

And just just as you see,

you know, that allows us to actually not change the engine.

Now the reality is, well, you still need to change the engine,

because, you know, while this

looks very, very good on paper, then Deville is in the details, and then there isn't a pluggable storage engine support at postscripts. So we had to do a little bit of surgery.

The important bit is the amount of that surgery

is not huge,

and that allows us to keep the compatibility.

So that's how you attach Postgres to our storage. Now what about the serverless bit?

So the way serverless works is we run Postgres in the VM,

and, we change the size of that VM

and add more memory and CPU to the VM based on the workload

and then remove memory CPU from the VM if the workload doesn't require as much memory of CPU.

So also on paper, that that sounds wonderful and easy to do. Well, turned out that there's a lot going there. So so we had to build a lot of the VM expertise

internally at NEM to support that.

We also thought about running compute nodes and containers.

Well, they're not really isolated, and those risks can be hacked, or broken out of the process,

which is not ideal.

So we needed that security boundary.

In addition to that, we wanted our VMs to be able to to change hosts.

And Postgres is a stateful system

while state lives in our most storage,

but even the connection

to Postgres is stateful. You interact with Postgres by establishing a TCP connection.

So if you move your

your container from 1 VM to another, you break that connection.

VMs, you can actually log migrate, and even the TCP connection remains. So that was kind of the the second reason for us to use VMs. And now there's a ton of VM expertise because we, you know, run 100 of thousands of them all at the same time

on on our platform.

So, basically, the answer, how do you deal with the, you know, massive ecosystem of Postgres

is, well, through that architecture, we don't break out compatibility because the engine itself is still Postgres.

It's just for swapping out storage,

from under Postgres, and the API to storage is so

small,

that it doesn't impact

app compatibility.

So developers don't suffer,

but do they thrive? That's another question.

So and 1 of the things that developers need to thrive

Well, some of this stuff is silly. You you know, you go and,

launch an RDS instance. It's not connected to the Internet.

And if you run CloudFlare Workers, it's a gigantic pain to connect

your application to the database.

You know, certain things don't support TCP connections, so that's why we launched our serverless driver.

Postgres

doesn't do very well with lots of connections, and therefore, there are systems like poolers,

our PG balancer that allows you to scale number connections to Postgres.

So part of the value is just packaging all of that and make it as stupid simple to consume, never run out of connect connections,

and never do operations that you have to do without the systems

adding infrastructure

to just the core database.

But then there's more than that. Every application

that a modern small team does. Right? Bigger teams, they have DevOps teams, SREs. They they stand up their own CICD pipelines.

But if you're taking things off the shelf, if you're taking GitHub, if you're storing your front end on resell,

if you're running your software development life cycle by by sending PRs and and run tests and GitHub actions,

Turns out

database doesn't play nice in in that,

and we made it play nice. So we have database previews,

which is achieved through the technology with whole branching.

We have

the ability to create those previews

based on every PR

in GitHub,

and now we're adding more and more features that would integrate

with

a deeper

JavaScript ecosystem. So when you build apps and you need things systems like auth or payments

or storage, that's also trivial to do on the end.

So all of that

kind of falls under the umbrella that you wanna ship your applications faster.

That's really the whole acceleration movement,

which is, you know, mostly driven by AI. But really, by developer productivity, you can crank those apps much faster now, and for that infrastructure needs to support

them. That that all of that contributed to the vision of of that we have at Nian.

In terms of the

engineering that you had to do on Postgres,

as you said,

Postgres is known for being very pluggable,

but the storage engine, at least to date, has not been 1 of those plug in interfaces, though my understanding is that that is changing.

Wondering how you have had to approach

the rework of that Postgres engine to minimize the footprint of your changes while maximizing

the capabilities

that you're

enabling

and some of the ways that the scope and goals of your work on Postgres and Neon have changed from when you first

came up with this vision of what you wanted to build to where you are today, where you have a real world production system that people are using every day?

That's

honestly, it's not the hardest part.

The the specific work that's happening on the Postgres engine

is

whatever we can push into the extension, we push them to the extension,

and then the rest we we forked. Right? And the way we forked it is

we know that this is gonna make it

into the core product, either in in this form or once the pluggable storage engines

will be introduced.

And the amount of changes that that that we need in the core engine

are so small that it's trivial to merge them as the new version of Postgres shows up.

So

I don't know if it's gonna be Postgres 18 or 19,

but by that time, I don't think we're gonna have any differences

between Postgres 19 and Postgres that we run on the platform,

and all of that will go into the pluggable storage or extension API.

I think

the more interesting question is, like, where do we spend time? Like, where where does where's the innovation?

And the innovation is at the e

at the both at the bottom, what what lives from under

Postgres

that,

you know, enormous amount of work that we did by building our, storage subsystem

that is

fully elastic, multitenant,

integrates with s 3.

We can run it globally,

around the world.

That's kinda a marvel. The size of that project is similar to that of, like,

maybe pure storage.

It would be a comparative,

for us,

except for Pure Storage and Appliance and and and we are cloud service. And then,

another piece of work is above the database.

So not only we made it serverless, what via that VM technology,

but we also put a very nice developer veneer on top.

And that goes into,

you know, 1st, it's serverless, 2nd, consumable from HTTP.

Now it's pluggable into, like, Next JS, all this, like, modern JavaScript framework,

supports software development life cycle, integrates with Vercel.

There's, like, thousands of people using our Vercel integration and and and you

connecting database previews with Vercel previews

and the stuff that is coming down the pipe,

around authentication,

around storage,

around payments,

kinda like more of a backend

as a service platform,

not just the database.

And I think that's that's the right direction for us.

While the database itself is very, very valuable, I think fundamentally, we're delivering in a shorter cycle speed to developers.

And our slogan is ship faster with Postgres, so that's why

we we have to take over more of the app

to allow our users to ship faster,

and that's where a lot of innovation is coming into.

On that note of the branching capabilities,

obviously,

that maps more closely to the ways that developers think about developing and deploying and debugging features.

How has that capability being pushed into the database

changed the way that development teams approach their iteration cycles and the ways that they think about actually

managing

their workflows

and debugging capabilities?

I think it all starts

with a specific pain. So let's start with the pain. What's the pain? Today, if you run a production system and you wanna send up a staging environment,

you need to move data from the production system into the staging environment. I'm not even talking about how fresh this data is. I'm just talking

about give me a snapshot as of, I don't know, yesterday or today, a few hours ago. Give me the snapshot of that data in my staging environment.

Well, turned out, for whichever reason, it's a hard pull. Right? It's not that easy to do. Now I have my staging environment, but I'm I'm

sharing that staging environment with my whole team. Let's say my team has tens of people, maybe hundreds of people.

They all need a staging environment and they all change in schema because they're building the app. Now they conflicting

on both their resources,

but then the other centralized resource is just the state of that database. So you can't

have a

but if you wanna have a staging environment on the per developer, and God forbid, they also test in performance, now you you have hundreds of copies of that. Not only hard to manage, it's also inefficient from the cost standpoint. Now imagine an alternative thing. Let's just say it's trivial to create a staging environment

by creating a branch of the production environment. From there, you may or may not,

small teams don't, but larger teams

definitely

do mask or

override all the PII data. But then once you have that staging environment,

how can you have developers,

you know, create developer environments without breaking the bank? So each developer environment shouldn't cost you very much.

And then but still allow you to run performance tests if you want. And then the other 1 is how do you as you develop features and they all conflict

on the database schema, how do you make sure that you resolve those conflicts? And this whole thing plugs into your CICD pipeline. The the fundamental

primitive that we have is database previews,

which we call branches today, but we actually change in that language. We're gonna call it previews everywhere.

And when you create a preview,

it gives you a full copy of your data,

data and schema,

and then it's isolated. So for for developer, it's yours.

Underneath,

storage does this smart copy and write thing, where creating a copy,

is 0. Right? So it's very quick. And then compute is just separate. Right? So it's a different VM that runs Postgres,

and that's your compute. So that's the definition of separation of storage and compute

and taking advantage of that architecture here. Now in your developer, you can do whatever you want. You can change data. You can change schema. You can test performance. You can drop indexes, gradients, whatever. But then you want to roll these things forward

into

first stage and environment, and then eventually production environment. What we've discovered is that people don't really care about

the changes in data. As a matter of fact, the the data changes

in the dev environment should not propagate it all the way to the production, But the application depends on the schema, so schema has to to migrate forward. There are lots of tools

that help you with schema migrations.

Those are called ORMs.

Things like Prisma, like Drizzle, Type ORM, and whatnot. And we're just plugging in into that workflow.

So so we we're thinking very hard both

of what is our place on the sun and what we should, you know, grab from the ecosystem and be orthogonal to to that.

So ORMs run migrations within the context of of 1 database,

but, certainly, they're it's not in their power to generate

database previews and give you this fancy forking capability. So that's on us. But then we package it all such that it's trivial to set up the, the software development

pipeline and and and life cycle,

where for every feature,

you go to staging,

create your development branch. If you wanna create a sing,

1 every single time, you can just refresh

your dev branch from whatever is the current thing in staging. Maybe you wanna mid staging, just do it, directly from production, which is totally fine as well. Develop your feature, send the PR,

and from that point on, we got it. So

that really speeds up the cycle, to be honest, and, we kinda super excited to see

our customers,

taking full advantage of of that technology.

With the separation of compute and storage,

you're creating

another hop for that data to flow.

And I'm wondering some of the ways that you are thinking about the impact on latency,

the impact on reliability, and the ways that you're engineering around that problem to get the best of a fully integrated stack of Postgres where it's all running in 1 unit.

But the scalability

and in in terms of both compute and storage and pricing of being able to actually separate those tiers and the additional layers that you've had to work in to be able to mitigate those latency or performance impacts?

This is a very fair question. The important thing to understand is

that if you run a highly available environment with your classical deployment when you have 2 or 3 nodes, there is a network hop there anyway. So when you ride a when you ride into, you know, the master node to the primary node, the transaction

is

then over a network hop

sent to

a a replica. And those are, quote, quote, synchronous

replicas.

So that write needs to be acknowledged by the replica, and only then you can acknowledge the transaction that you sent to the primary. So there is an, in highly available environment, the hop is already there. If you run Postgres on an EBS node, well, EBS node is network attached as well. So we're we're

not really

actually, we do, but, like, at the high level, it's roughly the same number of hops. While the reality, there's a Paxos protocol

that we use

for reliability when we send the log record into our service that's called Safekeepers.

So that have multiple hops to to persist the the record in the access protocol. But it's not like you can avoid network hops altogether in some other architecture you can't. The latency is fundamentally becoming the latencies and throughput are becoming roughly the same. And roughly, there's still a bit of a haircut that we're taking on on latencies.

But in return,

we're giving you infinite IO throughput. Right? Because our source is multi tenant,

and, you know, we can request as many pages as you want. So that's the trade off, and specifically works super well for much larger databases.

And for small databases,

performance usually is not a problem. So that's the answer for

for the question of, so what do you how do you deal with a network cop? Are you strictly worse? And the answer is, well, not really. You have those network cops anyway. Another aspect of postgres from an operational perspective that anybody who has run it for long enough has gotten bitten by is the upgrade process where you have to

deploy the new node, but you have to keep the old version around to be able to do the upgrade of the storage engine, and it's always this complicated dance. And I'm wondering how you're thinking about

removing that pain for the end user and some of the ways that you, as a platform operator, are addressing the automation and scalability of that upgrade cycle?

As coming from SQL Server, and it's been

15 years

since I left this SQL Server,

the, the fact that it doesn't have online upgrades bewilders me. And the way that the upgrade process is set up in Vanilla Postgres is frankly strange.

SQL Server, you just restart.

You're like, you know, shut down the old binary, start a new binary, point to the data, location, and then it just upgrades on the spot in place. And, the SQL Server team makes sure that the upgrade upgrades never fail.

They just, like they kinda guarantee

that this is the case. Here, you have to do a bunch of dance to upgrade a Pogo instance,

but we just treat it as a feature. By the way, we don't have that feature yet. But this feature is under development. You know? It's not difficult, but think about it. You know, we run a cloud service. There is a playbook of how to upgrade Vanilla Postgres. We apply in that that playbook

for for our instances. It's trivial to us

to stand up a particular version,

of Postgres,

in in that micro VM that attaches to storage. Of course, we need to do a bunch of manipulations

so that storage is in the right format, so you can attach the next version. Yeah. It's a feature. We'll build it. It's not there yet.

And also from the fact that you are focusing on

the developer community,

how much

does version factor into their end user experience of, oh, I wanna run Postgres. Is it okay? Well, which version do you want? Or is it just, okay, here's the latest, and we'll make sure you stay on the latest? We debated that. We let people choose the version, the post test version

today.

I was actually advocating to not.

I was saying let's just run the latest version and upgrade ourselves,

but then we didn't have the upgrade feature for a while, and we still don't have it. It's coming. So we landed somewhere in between.

So when the new Postgres version shows up, the default,

Postgres that we spin up is the latest version. We don't upgrade automatically, and we'll let people choose up to 2 2 versions back.

And so far, the architecture of our storage

allows us to do that. Again, it's a testament to kind of the the level where we plugged it. So we plugged in at the page level and pages don't care about the version. So so that all works. I think there are benefits to just being on the latest version. I just lost that argument when we were introducing that feature, but we haven't been bitten by that much.

And we haven't been bitten by this much is because Postgres

is fairly disciplined and and regimented in how it releases. It releases once a year.

Not that much stuff changes. Developers, for the most part,

don't care as much about this being we're in this version or that version.

You know, every now and then, some good developer features like JSON

showed up and, you know, developers care about those.

Otherwise, they just like they just it's like Linux. Right? Linux kinda works, you know, this version or that version, only the operator really cares,

But the the end user doesn't care as much.

On that note of developer focused features,

the topic that has sucked all the oxygen out of the room for everything else in tech is AI and generative models. Commensurate with that is the rise of vector databases.

Postgres has the PG vector extension.

As somebody who is

running a

platform as a service for Postgres,

what are some of the ways that you're thinking about the utility, the messaging around, and the impact on your business of PGVector

and the ways that it incorporates with the Postgres ecosystem?

Oh, it's been huge. We're actually contributing to PGVector.

There's a nice story there. Keiki, I think, is the number 2 contributor.

Still much smaller than the creator of pgevector, Andrew Kane, but nonetheless number 2 contributor.

We found a way to improve PG vector a year ago,

and, we realized that there is, you know, this index polish NSW.

We thought it should be in p g vector.

We didn't have a way to contribute to it, and then we built an alternative extension to VPG embedding

that demonstrated material improvements

for

the IVF flat implementation of the index that pj vector had while still has it. Now you can choose. Once we showed the science, Andrew started to work on HNSW

and,

introduced HNSW

PG vector. He's done a great job, and

that

basically,

prompted us to retire PG embedding. And we took all our knowledge that we've collected

by building PG embedding.

And what was applicable, we contributed back to PG vector.

So,

so that was our experience.

From the,

you know, business perspective,

oh, it's wonderful.

It's wonderful that this thing is there. We obviously support it. We're contributors.

We do things also. Our architecture makes it better to run PGA vector on Nian versus,

other platforms.

And specifically,

when Nian builds an index, and this is a very heavy compute operations because, you know, you do this a lot of the spectrum math as you, you know, create that index.

So super compute heavy, super memory heavy as well. Neon can temporarily

give you more compute and memory on demand

and then shrink it back down. So you don't need to commit to very large instances ahead of time

when you when you use DG Vector. So so that was great.

People build AI apps.

Each AI app needs stuff. Right? 1 of the biggest things, what makes an AI app an AI app, you'd talk to an LOM. Well, that's not us. But when you build a rack application, you do need a vector database, and then the rest of the plate, of the application chooses Postgres anyway.

So so that's what we do

with Neon, and so far, that's been working great for us.

I think there's gonna be more and more

demands, and especially as we add more developer features to the platform in addition to just the database,

we'll see more demand of having AI relevant features.

We have a bunch in the pipeline, and and we'll be announcing them kinda soon. Another element that we've touched on throughout is the fact that everything you're building is being released as open source and permissively licensed.

I'm wondering how you think about the relationship

between the open source code and your business model and the overall sustainability

of both.

So we're exposed to hyperscalers.

I don't think we're exposed to anybody else.

So in order for you to run a service like Nian,

you need to have

several pieces of the expertise.

1 is,

well, you need to understand what's written in the code. It's very scary to run somebody else's systems code.

And

if for an operational database, if there is a bug, you need to fix the bug. So you need to build that expertise.

You also need to, you know, stand it up, set up all the absorbability,

upgrade systems, like, basically, it's like set up processes that that allow you to run it well,

and then you need to build a a team of committers that touches every part of the stock.

So for a start up,

next to impossible.

For a large company like Amazon, Microsoft, Google, it is possible.

It's also possible to build this whole thing from scratch for, frankly, for Amazon,

Microsoft, and Google, and Amazon has already done it with Aurora.

Microsoft is a little behind, and that's why we are actually partnering with Microsoft.

And then, you know, tune in to some announcements.

And then Google has a project called AlloyDB,

which,

I think just like behind me with with the stuff that we can do.

So

I don't know. It is possible to to, like, quote unquote steal it, but

only a handful of companies

actually can.

In the US, it's Amazon, Google, and Microsoft, We're partnering with Microsoft, and Google and Amazon has already have already done it. So I think we're good.

As you mentioned,

your previous company was another database company. I'm wondering what are the lessons that you learned in the process of building and growing

single store previously known as MemSQL

that have been most useful to you in the work that you're doing on Neon?

The first 1 is focus. Right? We did too much at single store. You know, it's north of a $100, 000, 000 run rate business. So,

like, we we didn't fail, but we didn't take it public

yet,

and it's it's been some time.

So

and if you kinda zoom on the on the reason is we did too much. We were on prem when we were in the cloud. We're

supporting operational workloads and analytical workloads, and the the problems were different

in each 1 and shared nothing architecture,

either both for analytics and and,

and operational workloads

breaks compatibility

with the the mothership. For us, the mothership was MySQL.

We didn't use MySQL code because it's GPL,

but we, you know, use MySQL protocol and syntax,

and then turned out all the subtle bugs and the compatibility

is something that I have a lot of scars on. So

we

we certainly

fixing that in with with Nian. We're we're not breaking compatibility.

And then on

the analytics side,

well, cloud and object stores was something that we ignored for a while and then eventually caught up, but that was kinda too late.

So

I think the big 1 is focus

and then

driving very, very hard towards becoming the default.

And maybe

that will take some time, but for the outside observers

and then later for customers and partners,

it's very, very clear where you're going and where we want and we want to become the default development platform for Postgres,

and therefore, our architecture,

marketing.

So

if you do all the top line right, your your technology is very, very solid.

Your positioning is very, very solid. Your developer experience is solid. Your design is solid.

Bottom line, what kinda follows,

and that's our intention.

So

I I think at single store, we we had a lot well, I personally had a lot more energy. I lived in the office and

slept next to the servers,

but,

I

I didn't I lacked that maturity and focus, which, I'm I'm bringing it here at Neon.

In the work that you're doing on Neon and the ways that you're seeing people use it for their own use cases, what are some of the most interesting or innovative or unexpected ways that you're seeing Neon used?

The stuff that we didn't expect

is,

and people do it a lot, is to using,

1 1 database, 1 instance per tenant.

They're like, well, they're kinda cheap. They stand up in 22 100 milliseconds.

I'm just gonna run a full

blown database servers for full postgres for 1 user.

And we didn't expect that. Now there's, like, companies, they they run

fleets

of,

of instances,

and

it works actually really well if you have uneven consumption on a per client basis. Have a long tail

of customers, they barely use it. Okay. Well, you're basically paying 0 for those with Nina. And then some using quite a bit, and for that, you need elastic

compute.

So it allows you to kinda right size your usage very well.

The second thing that we didn't expect at all,

is that people run what if scenarios with our branching capability.

They're like, okay.

Every customer that it's exploring

and that there's specific financial planning product,

they're,

exploring the impact of certain changes.

Oh, well, we'll create a branch and then go hog wild on the branch, change data,

you know, reads, writes, whatever, and then compute the final result

that gives you an answer if what if scenario is successful or not. And then successful, you proceed. If not, you throw out the branch. That was another unexpected thing that I I I didn't even know that scenario existed.

The rise of our serverless driver was another surprise to me. It turned out JavaScript developers don't know what a connection

is and what a socket is, and I think it's great, actually.

Like, nobody needs to know. So so, you know, JavaScript

engineers consume me and using our serverless driver that allows you to to, you know, query me in over HTTP.

That was another interesting surprise.

What else?

Yeah. That those are probably the 3 most most interesting ones. Pg vector

caught us by surprise as well

Very quickly as we launched EMP, we were like, do you have PG vector? And and so that became Con X standard.

And in your experience

of building the Neon product, scaling the business, what are the most interesting or unexpected or challenging lessons that you've learned in the process?

I think we'll learn to automate a lot here.

So we

we actually don't scale

anything with people.

We're scaling

everything is scaled by technology.

So for example, there's a feed on my Slack for

new customers upgrades.

You know, people start with the free tier. When they upgrade, I get a notification on the feed. Everything

is

there's telemetry on everything.

And then there's a data team that is very busy,

producing

that that works for us.

We we make the decisions based on data,

and we think about the world as kinda like this real estate.

And we know we have good products, so we just need to claim that real estate in the world

and then measure everything.

So

that that I think

was also a difference between operationally running Nien versus running single store.

Also, Nien is fully remote

that allows us to tap into, unbelievable

talent around the world at the cost

of the communication overhead.

So

I still don't know which 1 I like better. It's very fun to go to offsites and then, like, meet the team and then all these people who who do incredible work and then seeing them in person.

And at the same time, you know, you do pay the communication overhead for people working from home.

So, yeah, those are those are the learnings.

As you were talking about the data that you're collecting, the dashboards that you're building,

obviously, to run a business, you need a database. It makes sense that you would use Neon internally as well.

What are the cases

where you're actually not using Neon and you need to turn to a different data engine?

We have 2 more data engines at least, maybe more.

1 is powering our Grafana dashboards.

And, for observability, you need an observability engine. And then we use Snowflake for all the reporting.

We're we are at 60 terabytes of data in Snowflake, and this number is shocking me. Like, how come tiny Neon generated so much data?

But, you know, we are where we are. Post's just not good for for for that.

It's interesting that there's more and more

work going into Postgres to support data and analytics scenario.

It's gonna be a while until it's gonna be like a full fledged data warehouse.

But think about it. Postgres is like Linux. It's a commoditization for us.

And a vectorized

column store query processor

is the future at the end of the day. You know, SQL Server and Oracle have those.

So we'll, we'll have them in posters too,

and that's coming. And integration with the data lakes is coming as well. There's already

plugins that people discuss on Hacker News that that provide such functionality

that allow you to create parquet. And in the future, you know, it's gonna be iceberg integration,

Delta parquet integration. So

all of that has come into the platform.

Where it stands today though for analytics,

it's good for small scale. It's not very good for our scale. And for that, we we use Snowflake.

The cases where Postgres is not the right choice is something that many people have already discussed in various

contexts.

But for the case where Postgres is the right choice, when is Neon not the right way to run Postgres?

There are,

well, there is the meme walking or going around where I just use Postgres.

And I think it's it's great for us.

I think it's great for the industry. Again, I think

the the era of lots and lots of database engines, that are built for purpose

is coming to an end.

And you certainly still need, you know, a data warehouse or a data lake, and history will tell if actually all you need is a data lake you need a data warehouse as data lake. We'll see.

And then you need an operational database, so that's process.

Then there's, like,

all the other things that

that you potentially need as well.

I think over time,

they will all go 1 way or another. Meaning, they will either be part of the data warehouse or they will be part of an operational database.

If operational databases and and analytical databases is gonna be 1,

that I don't know. I tried,

with single store, and again, we scaled this past

close to $100, 000, 000

in run rate.

I don't think we we had enough of an industry impact to say you just need 1.

Maybe. Maybe 1 day. But I still don't know even, you know, after 12 years of

a single store, I don't know if it's 10 years out

or 20 years

out. It's certainly not

2, 3, 4 years out because not only you you need to build technology, you need to change how people

build software, and that's like a toll order.

So I would say don't use Postgres for large scale analytics today,

and don't use, a data warehouse to power your

operational apps for OLTP.

In between, you can decide, you know, which way it goes. You have something in between.

Everything else kinda will will will be pulled in into 1 of those 2.

And as you continue to invest in

Neon, what are some of the things you have planned for the near to medium term or any projects or problem areas you're excited to explore further?

Oh, there's a ton. So

more clouds for sure. So I think

we want to be the default post with software in everywhere in the world,

and,

we're gonna launch another cloud this year. So I'm super excited about that.

That's 1.

We're gonna add more developer features. We're gonna have we're gonna make it much easier to build off payments and storage

and manage off payments and storage with Nian.

And we'll do it with with some partners. So super excited about that as well.

We're gonna launch your GitHub app,

which will make a much tighter integration for you with CICD

and, automatic creating of previews.

We're gonna start bridging some of this,

not replacing, but integrating

me in with the data lakes.

So,

that's another super exciting part.

And then we're gonna launch more platforms, meaning it's not our platforms, but other platforms that use Neon as the default database provider.

So,

yeah, that's a lot. So so I'm excited to pull all all of that off and then

launch

well with high production quality.

So I'm looking forward to all that.

Are there any other aspects of the Neon project and the business that you're building around it or the Postgres ecosystem that we didn't discuss yet that you would like to cover before we close out the show?

1 thing that is not obvious

to

everybody is how actually a few people moved the Postgres project forward.

And there are certain amount of aging that's happening in the core contributors to Postgres.

So

I think

what would be very useful for the industry, not just for us,

and we are contributing in a small way. We have a Postgres team, and

Neon engineers contributing to the the core Postgres project

even in places where

not kinda obvious how that benefits

Neon outside of just, like, well, post this gets better.

So we do some of that work, and Hickey continues some of that work, and he would use patches.

Right? The industry should train

more people

who

are Postgres kernel engineers

because of that aging problem.

And the the absolutely

top contributors to the Postgres engine are now in their fifties and sixties.

So

would be nice if more systems engineers from around the world, younger system engineers

started to contribute to Postgres.

This is called for engineers

and also called for the industry to sponsor this work. And the best way to sponsor this work is if you have a high dependency on Postgres, if you you're running lots of Postgres

instances in production,

it's not that expensive

for for the big business

to to have some of those engineers contribute to the post risk kernel.

Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

Well,

can I talk about AI?

Absolutely.

I I think

well, we stood up a data team at Nian,

and

even for a small company like ours, it wasn't easy.

I think there should be an AI

data engineer

that,

can

kinda, like, put it all together for you

and describing that problem in a,

very

broad way

because I don't wanna

lead

to the answer. Right? This is definitely not text to SQL.

Text to SQL is like a tiny piece of that problem.

The problem is

I don't have a data practice.

I stand up this thing, and that thing figures out and stands up the data practice for me,

and acts like a human, and that's tricky.

But I think it's possible because now we see those. These systems like Devon from Cognition Labs,

you see all this, like,

AI engineer,

TypeWork. We see magic dot dev, like and and stuff that people are showing is quite magical. So you're like, okay. Well, it's coming.

I think that's 1 of the things that the the data management is missing.

You can go further.

You can say,

well, a data warehouse is a gigantic calculator.

Right? And, it's a gigantic calculator,

but in order to take advantage of that calculator,

you need to, like, really organize data. Put it into tables, columns, obsess about the schema, understand

this has a semantic meaning to it.

But imagine a gigantic brain

that you can just, like, shop data in, and that thing makes sense of that data, and then answers business questions.

So I don't know what this means to the future of data warehouses,

investing existed.

Now again, I'm, like, thinking a little bit far ahead on this,

but if we dream a little bit, then

we we may find unusually

different architectures

for for data and analytics

that are fully

AI driven. And not just AI on top of a data warehouse, but maybe changing the the architecture of the the whole data warehouse.

But, you know, we'll see.

It's definitely a very interesting future that I'll be excited to see how it develops. So thank you very much for taking the time today to join me, share the work that you and your team are putting into the Neon project. It's a very exciting project or very exciting product,

so I'm excited to see the ways that it continues to develop.

Thank you again for all the time and effort that you're all putting into that, and I hope you enjoy the rest of your day.

100%.

Thank you so much.

Thank you for listening.

Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast,

which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com.

Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it.

Email hosts at data engineering podcast.com

with your

story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

Data Engineering Podcast