PGConf.dev 2024 - New logical replication features in PostgreSQL 17 (original) (raw)

I attended the PostgreSQL Development Conference 2024, which was held from in Vancouver, Canada. In this article, I will mainly introduce the content of the session I gave.

This was my second time attending an international conference, following last year, where I introduced my work on logical replication. I once again realized that PostgreSQL's scalability holds infinite possibilities.

What is the PostgreSQL Development Conference (PGConf.dev)?

Port of Vancouver near the conference venue

PGConf.dev is an international conference where PostgreSQL developers and community managers gather to give talks and hold discussions.

It stands out from general technical events by placing a strong emphasis on fostering meaningful interactions between PostgreSQL developers and community managers. Unlike conferences that are saturated with corporate advertising, PGConf.dev provides a platform where professionals can engage in insightful discussions, share innovative ideas, and collaborate on advancing the PostgreSQL ecosystem.

Until last year, a conference called PostgreSQL Conference (PGCon) was held, but PGConf.dev is its successor, continuing the tradition of bringing together experts and enthusiasts in the field. This year, it was held at Simon Fraser University (SFU) in Vancouver, Canada. The venue at SFU not only offered a conducive environment for learning and collaboration but also added a touch of academic charm to the conference setting.

Three people from Fujitsu - me, Amit Kapila, and Zhijie Hou - attended and hosted two talks.

My session: New features added to logical replication

My talk was mainly about logical replication.

Logical replication is a mechanism that extracts changes made to data and replicates them to another PostgreSQL server. A well-known replication feature of PostgreSQL is streaming replication, but this feature requires that the physical representation of data be consistent between nodes, so replication cannot be performed on heterogeneous operating systems or between different major versions of PostgreSQL. Logical replication relaxes these restrictions, making it possible to build a more flexible system.

Differences between streaming replication and logical replication

Starting with PostgreSQL 17, a new server application for creating logical standbys (subscribers) will be added, and pg_upgrade will be available without destroying logical replication configurations. I explained these new features of the upcoming version

Resolving the issues of setting up new logical replication in large-scale environments

Although logical replication is still being actively developed, it still has some problems.

One of them is that it is difficult to set up new logical replication in a large-scale environment. In logical replication, a COPY statement is first issued for all target tables to perform initial data synchronization. Therefore, initial data synchronization may take a long time depending on the number of tables and the amount of data involved.

In addition, since logical replication needs to keep the WAL generated during synchronization, if the synchronization time is too long, the WAL storage disk may become full and the server process may crash.

Known challenges in creating a subscriber

Therefore, we focused on read replicas (asynchronous physical standbys) that may exist in the system, and developed pg_createsubscriber, a server application that converts physical standbys into subscribers.

Since the problem is caused by having to copy from scratch a large amount of data, the time required for initial synchronization can be reduced by performing streaming replication to a certain extent, and building logical replication based on the nodes that are following the changes. The problem of large amounts of remaining WAL is solved by not performing initial data synchronization using the COPY statement in the first place.

New server application pg_createsubscriber in PostgreSQL 17

Simplifying the upgrade process for logical replication clusters

PostgreSQL 17 also solves another issue that logical replication had: it is not (practically) compatible with pg_upgrade.

When building a logical replication environment, objects such as replication slots and replication origins are generated. These are necessary to record the replication status, such as the WAL transmission status and application status, but because they are node-specific information, they are not migrated by upgrades using pg_upgrade. Therefore, after upgrading a node that builds logical replication, users had to manually reconstruct these internal objects.

For this reason, we have improved pg_upgrade to reference and rebuild these internal objects. This function makes it possible to automatically resume replication even when upgrading a node that has logical replication.

Wrapping up

img-badge-person-01This was my second time attending an international conference, following last year, where I introduced my work on logical replication.

Because PGConf.dev focuses on interaction between developers, I was able to discuss solutions to logical replication issues with the participating developers, and I was happy to have insightful discussions with my fellow professionals there. I once again realized that PostgreSQL's scalability holds infinite possibilities.

Our team at Fujitsu hopes to continue to actively propose ideas and participate in discussions at conferences, further contributing to the development of PostgreSQL.

Further information

For details about the talks at PGConf.dev 2024, please see below.

Topics:PostgreSQL community,PostgreSQL development,Logical replication,PostgreSQL event