Reducing bloat in tables and indexes with the pg_repack extension (original) (raw)

You can use the pg_repack extension to remove bloat from tables and indexes as an alternative to VACUUM FULL. This extension is supported on RDS for PostgreSQL versions 9.6.3 and higher. For more information on the pg_repack extension and the full table repack, see the GitHub project documentation.

Unlike VACUUM FULL, the pg_repack extension requires an exclusive lock (AccessExclusiveLock) only for a short period of time during the table rebuild operation in the following cases:

postgres=>\dt+ repack.log_*  
List of relations  
-[ RECORD 1 ]-+----------  
Schema        | repack  
Name          | log_16490  
Type          | table  
Owner         | postgres  
Persistence   | permanent  
Access method | heap  
Size          | 65 MB  
Description   |  

For the rest of the rebuild operation, it only needs an ACCESS SHARE lock on the original table to copy rows from it to the new table. This helps the INSERT, UPDATE, and DELETE operations to proceed as usual.

Recommendations

The following recommendations apply when you remove bloat from the tables and indexes using the pg_repack extension:

SELECT pg_terminate_backend(pid);  

Pre-requisites

SELECT pg_size_pretty(pg_total_relation_size('table_name')) AS total_table_size;  

This space is reclaimed after the completion of the activity.

To use the pg_repack extension
  1. Install the pg_repack extension on your RDS for PostgreSQL DB instance by running the following command.
CREATE EXTENSION pg_repack;  
  1. Run the following commands to grant write access to temporary log tables created bypg_repack.
ALTER DEFAULT PRIVILEGES IN SCHEMA repack GRANT INSERT ON TABLES TO PUBLIC;  
ALTER DEFAULT PRIVILEGES IN SCHEMA repack GRANT USAGE, SELECT ON SEQUENCES TO PUBLIC;  
  1. Connect to the database using the pg_repack client utility. Use an account that has rds_superuser privileges. As an example, assume thatrds_test role has rds_superuser privileges. The following syntax performs pg_repack for full tables including all the table indexes in the postgres database.
pg_repack -h db-instance-name.111122223333.aws-region.rds.amazonaws.com -U rds_test -k postgres  
Note

You must connect using the -k option. The -a option is not supported.
The response from the pg_repack client provides information on the tables on the DB instance that are repacked.

INFO: repacking table "pgbench_tellers"  
INFO: repacking table "pgbench_accounts"  
INFO: repacking table "pgbench_branches"  
  1. The following syntax repacks a single table orders including indexes inpostgres database.
pg_repack -h db-instance-name.111122223333.aws-region.rds.amazonaws.com -U rds_test --table orders -k postgres  

The following syntax repacks only indexes for orders table inpostgres database.

pg_repack -h db-instance-name.111122223333.aws-region.rds.amazonaws.com -U rds_test --table orders --only-indexes -k postgres  

Monitoring the new table during the repack

Note

You can use pgstattuple extension to calculate the bloat in the table. For more information, see pgstattuple.

postgres=>SELECT relname,n_live_tup FROM pg_stat_all_tables WHERE schemaname = 'repack' AND relname ILIKE '%log%';  
          
-[ RECORD 1 ]---------  
relname    | log_16490  
n_live_tup | 2000000  
          
postgres=>SELECT  
     SUBSTR(query, 1, 100) query,  
     round((round(total_exec_time::numeric, 6) / 1000 / 60),4) total_exec_time_in_minutes  
 FROM  
     pg_stat_statements  
 WHERE  
     query ILIKE '%repack%'  
 ORDER BY  
     total_exec_time DESC LIMIT 5;  
          
 query                                                                 | total_exec_time_in_minutes  
-----------------------------------------------------------------------+----------------------------  
 CREATE UNIQUE INDEX index_16493 ON repack.table_16490 USING btree (a) |                     6.8627  
 INSERT INTO repack.table_16490 SELECT a FROM ONLY public.t1           |                     6.4150  
 SELECT repack.repack_apply($1, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>2</mn><mo separator="true">,</mo></mrow><annotation encoding="application/x-tex">2, </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8389em;vertical-align:-0.1944em;"></span><span class="mord">2</span><span class="mpunct">,</span></span></span></span>3, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>4</mn><mo separator="true">,</mo></mrow><annotation encoding="application/x-tex">4, </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8389em;vertical-align:-0.1944em;"></span><span class="mord">4</span><span class="mpunct">,</span></span></span></span>5, $6)                    |                     0.5395  
 SELECT repack.repack_drop($1, $2)                                     |                     0.0004  
 SELECT repack.repack_swap($1)                                         |                     0.0004  
(5 rows)  
          

Repacking is completely an out-of-place operation so the original table is not impacted and we do not anticipate any unexpected challenges that require recovery of the original table. If repack fails unexpectedly, you must inspect the cause of the error and resolve it.

After the issue is resolved, drop and recreate the pg_repack extension in the database where the table exists, and retry the pg_repack step. In addition, the availability of compute resources and concurrent accessibility of the table plays a crucial role in the timely completion of the repack operation.