Continuous WAL Archiving for Linux

This article includes step-by-step instructions for how to setup continuous write ahead log (WAL) archiving in an OnPremises Linux environment, with a large number of tests and test results. This incremental data backup process is beneficial for customers with a large amount of test and test results data.

The instructions below will only backup incremental changes from your last backup, which is quicker and more disk-space efficient than performing a full database backup. 

To utilize continuous WAL archiving, you will:

  • Setup incremental database backup and perform a 'restore' from incremental backups if necessary.

  • Setup a 'read' replica database which will provide continuous backup and live fail-over capability.

You will need to be a PostgreSQL Superuser to perform the following actions. 

Edit PostgreSQL

Edit postgresql.conf

  1. Access your PostgreSQL database.

    Example: /etc/postgresql/16.x/main/postgresql.conf

  2. Archive to the backup folder /bck/pgarchive using the following commands:

    Copy
    # The archive_mode must be set to on for archiving to happen. archive_mode = on # This is the command to invoke for each WAL file to be archived. archive_command = 'test ! -f /bck/pgarchive/%f && cp %p /bck/pgarchive/%f'
     # Ensure there is at least one WAL file for each "archive_timeout" duration, it forces the server to switch to a new WAL segment file at least that often, archive_timeout settings of a minute or so are usually reasonable archive_timeout = 900
     # Set high enough to leave at least one session available for the backup max_wal_senders = 3
     # All WAL records required for the backup must contain sufficient full-page writes full_page_writes = on
     # Keep around at least these many WAL files (aka segments). 16MB each; 0 disables, should be set high enough that the log is not removed before the end of the backup. wal_keep_segments = 64
     # The WAL level must be archive or higher for continuous WAL archiving, hot_standby or higher for standby, possible values are minimal, archive, hot_standby, logical. wal_level = logical

Edit pg_hba.conf

To edit the PostgreSQL host replication, enter the following commands:

Copy
host replication PostgreSQL 127.0.0.1/32 md5 host replication PostgreSQL ::1/128 md5

Restart PostgreSQL Service

To restart the PostgreSQL service, use the following commands:

Copy
sudo systemctl status postgresql sudo systemctl stop postgresql sudo systemctl start postgresql

Base Backup

The base backup should be done weekly or daily based on the size of your database.

The directory that contains the backup may exist but it should be empty. You can set the name of the backup folder to the date of the backup and let the tool create it when back up occurs.

Use the following command and enter the appropriate information for your site:

Copy
pg_basebackup --host=<host> --port=<port, default 5432>  --username=<user>  --xlog --progress --pgdata=</path/to/datadir> Example: ubuntu@ip-10-0-102-246:~$ time sudo -u PostgreSQL pg_basebackup --host=localhost --port=5432 --username=PostgreSQL --xlog --progress --pgdata=/bck/pgbasebackup/bck1 Password: 413970/413970 kB (100%), 1/1 tablespace real 0m5.307s user 0m0.384s sys 0m0.516s

Cleanup Obsolete Archive Log Files

Complete this process after the base backup process runs successfully.

  1. Use the following command to clean up the archive backup folder:

    Copy
    pg_archivecleanup -d <archive/location> <oldest_kept_wal_file>

    Example:

    The example includes 3 base backups and has 3 files indicating the backup info:

    Copy
     00000003000000050000004B.00000028.backup #bck1 00000003000000050000004D.00000028.backup #bck2 000000030000000500000051.00000028.backup #bck3
  2. List all files in the archive backup folder:

    Copy
    PostgreSQL@ip-10-0-102-246:/bck$ ls -l pgarchive/ total 163852 -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:29 000000030000000500000048 -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:29 000000030000000500000049 -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:31 00000003000000050000004A -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:31 00000003000000050000004B -rw------- 1 PostgreSQL PostgreSQL 305 Jun 29 08:31 00000003000000050000004B.00000028.backup -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:31 00000003000000050000004C -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:32 00000003000000050000004D -rw------- 1 PostgreSQL PostgreSQL 305 Jun 29 08:32 00000003000000050000004D.00000028.backup -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:37 00000003000000050000004E -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:42 00000003000000050000004F -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:44 000000030000000500000050 -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:44 000000030000000500000051 -rw------- 1 PostgreSQL PostgreSQL 305 Jun 29 08:44 000000030000000500000051.00000028.backup
  3. Delete the archive log files older than backup#2 (bck2):

    Copy
    PostgreSQL@ip-10-0-102-246:/bck$ pg_archivecleanup -d /bck/pgarchive 00000003000000050000004D.00000028.backup pg_archivecleanup: keep WAL file "/bck/pgarchive/00000003000000050000004D" and later pg_archivecleanup: removing file "/bck/pgarchive/000000030000000500000049" pg_archivecleanup: removing file "/bck/pgarchive/00000003000000050000004C" pg_archivecleanup: removing file "/bck/pgarchive/000000030000000500000048" pg_archivecleanup: removing file "/bck/pgarchive/00000003000000050000004A" pg_archivecleanup: removing file "/bck/pgarchive/00000003000000050000004B"

    You will need to setup a maintenance job through Windows Scheduler, or manually perform the 'Base Backup' and 'Cleanup Obsolete Archive Log Files' steps daily or weekly depending on the size of your database. 

Data Recovery

Stop Database Server

  1. Check the status of your PostgreSQL database using the following command: sudo systemctl status postgresql

  2. Stop the service using the following command: sudo systemctl stop postgresql

Backup Current Database File

If you have the space to do so, it is recommended to copy the whole cluster data directory and any table spaces to a temporary location in case you need them later.

This precaution will require that you have enough free space on your system to hold two copies of your existing database. If there is not enough space, you should save the contents of the cluster's pg_xlog sub-directory, as it might contain logs which were not archived before the system went down.

Use the following command to create a new folder which will contain all current data files before recovery:

Copy
mv /var/lib/postgresql/9.6/main /var/lib/postgresql/9.6/main.before_recovery

Restore the Database Files from your File System Backup

Be sure that they are restored with the right ownership (the database system user, not root!) and with the right permissions. If you are using table spaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored.

Remove any files present in pg_xlog/ and copy all WAL segments (archive log files) into pg_xlog/

In this example, we will use the base backup #3 (bck3) to restore.

  1. Copy from base backup, using the following command:

    Copy
    cp -rp /bck/pgbasebackup/bck3 /var/lib/postgresql/9.6/main
  2. Cleanup logs in pg_xlog, using the following command:

    Copy
    rm -rf /var/lib/postgresql/9.6/main/pg_xlog/*

Create New recovery.conf File

There are two options for data recovery:

  1. Restore to the most recent state, use only one parameter in recovery.conf file: restore_command: tells PostgreSQL how to retrieve archived WAL file segments.

  2. Restore to the specific point in time, use two parameters in recovery.conf file: restore_command: tells PostgreSQL how to retrieve archived WAL file segments recovery_target_time: specifies the time stamp up to which recovery will proceed.

You can get a sample file from: c:\Program Files\PostgreSQL\9.6\share\recovery.conf.sample.

OR

Create a new empty file recovery.conf: /var/lib/postgresql/9.6/main/recovery.conf.

  1. Save this content in that file to restore to a specific point in time, using the following command:

    Copy
    restore_command = 'cp /bck/pgarchive/%f %p' recovery_target_time = '2018-07-02 05:50:00.0'

  2. Or use only one parameter restore_command to restore to the most recent state, using the following command:

    Copy
    restore_command = 'cp /bck/pgarchive/%f %p'

Start the Database Server to Perform Recovery

Start the server, which will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery.

Upon completion of the recovery process, the server will rename recovery.conf to recovery.done (to prevent accidentally re-entering recovery mode later) and then commence normal database operations.

  1. To start the database, use the following command:

    Copy
    sudo service postgresql start
  2. Review your log file 'PostgreSQL.log,' as seen in the example below:

    -07-03 06:41:52 UTC [1634-852] LOG: restored log file "00000003000000080000009F" from archive 2018-07-03 06:41:52 UTC [1634-853] LOG: restored log file "0000000300000008000000A0" from archive 2018-07-03 06:41:52 UTC [1634-854] LOG: restored log file "0000000300000008000000A1" from archive 2018-07-03 06:41:53 UTC [1634-855] LOG: restored log file "0000000300000008000000A2" from archive 2018-07-03 06:41:53 UTC [1634-856] LOG: restored log file "0000000300000008000000A3" from archive 2018-07-03 06:41:53 UTC [1634-857] LOG: restored log file "0000000300000008000000A4" from archive 2018-07-03 06:41:53 UTC [1634-858] LOG: restored log file "0000000300000008000000A5" from archive 2018-07-03 06:41:54 UTC [1634-859] LOG: restored log file "0000000300000008000000A6" from archive 2018-07-03 06:41:54 UTC [1634-860] LOG: restored log file "0000000300000008000000A7" from archive 2018-07-03 06:41:54 UTC [1634-861] LOG: restored log file "0000000300000008000000A8" from archive 2018-07-03 06:41:54 UTC [1634-862] LOG: recovery stopping before commit of transaction 13924, time 2018-07-02 07:48:10.279609+00 2018-07-03 06:41:54 UTC [1634-863] LOG: redo done at 8/A801D758 2018-07-03 06:41:54 UTC [1634-864] LOG: last completed transaction was at log time 2018-07-02 05:26:23.275142+00 2018-07-03 06:41:54 UTC [1634-865] LOG: selected new timeline ID: 4 2018-07-03 06:41:55 UTC [1634-866] LOG: archive recovery complete 2018-07-03 06:41:55 UTC [1634-867] LOG: MultiXact member wraparound protections are now enabled 2018-07-03 06:41:55 UTC [1633-1] LOG: database system is ready to accept connections 2018-07-03 06:41:55 UTC [3538-1] LOG: autovacuum Launcher started

Setup Hot Replication

These instructions will use the example server information below:

  • Master Server IP: 10.0.102.246 and is configured using continuous WAL archiving as above.

  • Standby Server IP: 10.0.102.57

Master Server Configuration

  1. Edit the  pg_hba.conf to allow access from the standby server.

    Example: host    replication     PostgreSQL        10.0.102.57/32                 md5

  2. Restart the Master Server.

Standby Server Configuration

  1. Stop the database service, using the following command:

    Copy
    sudo systemctl stop postgresql
  2. Move the database folder to another location.

    Example: mv /var/lib/postgresql/9.6/main /var/lib/postgresql/9.6/main_old

  3. Stream data from the master server to the standby server, using the following command:

    Copy
    pg_basebackup -h <master_server> -p <port> -D <PostgreSQL/data/path> -U <user> -v -P --xlog-method=stream

    Example:

    Copy
    ubuntu@ip-10-0-102-57:~$ sudo -u PostgreSQL pg_basebackup -h 10.0.102.246 -D /var/lib/postgresql/9.6/main -U PostgreSQL -v -P --xlog-method=stream Password: pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed transaction log start point: 9/C2000028 on timeline 5 pg_basebackup: starting background WAL receiver 397557/397557 kB (100%), 1/1 tablespace transaction log end point: 9/C20000F8 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: base backup completed

Modify postgresql.conf

Use the following command to modify your postgresql.conf file:

Copy
hot_standby = on

Create the Recovery Configuration File recovery.conf

  • You can get a sample file from: /usr/share/postgresql/9.6/recovery.conf.sample OR

  • Create a new empty file recovery.conf: /var/lib/postgresql/9.6/main/recovery.conf

Specify the master server host with credentials in the parameter primary_conninfo, using the following command example:

Copy
standby_mode = on primary_conninfo = 'host=10.0.102.246 port=5432 user=PostgreSQL password=XXXXXX'

Start the Standby Server

  1. Start the Standby Server, using the following command:

    Copy
    sudo systemctl start postgresql
  2. Review your log file 'PostgreSQL.log,' as seen in the example below:

    2018-07-03 08:29:39.517 UTC [11849] LOG: received fast shutdown request 2018-07-03 08:29:39.517 UTC [11849] LOG: aborting any active transactions 2018-07-03 08:29:39.518 UTC [11854] LOG: autovacuum Launcher shutting down 2018-07-03 08:29:39.519 UTC [11851] LOG: shutting down 2018-07-03 08:29:39.534 UTC [11849] LOG: database system is shut down 2018-07-03 09:01:39.120 UTC [12142] LOG: database system was interrupted; last known up at 2018-07-03 08:49:20 UTC 2018-07-03 09:01:39.157 UTC [12142] LOG: entering standby mode 2018-07-03 09:01:39.164 UTC [12142] LOG: redo starts at 9/C2000028 2018-07-03 09:01:39.166 UTC [12142] LOG: consistent recovery state reached at 9/C20000F8 2018-07-03 09:01:39.166 UTC [12141] LOG: database system is ready to accept read only connections 2018-07-03 09:01:39.176 UTC [12146] LOG: started streaming WAL from primary at 9/C3000000 on timeline 5

Test the Hot Replication

  1. On the Master Server, create a new table and insert one record, using the following command:

    Copy
    qTestop=# create table ztmprep (name text); CREATE TABLE qTestop=# insert into ztmprep(name) values ('test replication'); INSERT 0 1
  2. Query the record on the Standby Server you just inserted on the Master Server, using the following command:

    Copy
    qTestop=# select * from ztmprep; name ------------------ test replication (1 row)

    The Standby Server is read-only, so no DML is allowed here. Therefore an error will be received. Example: qTestop=# insert into ztmprep(name) values ('test insert in standby server'); ERROR: cannot execute INSERT in a read-only transaction

Promote Standby Server to Master Server

Use the following command to promote the Standby Server to the Master Server:

Copy
pg_ctl promote -D </path/to/data/folder>

Example:

Copy
$pg_ctl promote -D /var/lib/postgresql/9.6/main server promoting

Test Promoted Server

Test the Promoted Server with the following command:

Copy
testrep=# create table ztmppromote(name text); CREATE TABLE testrep=# insert into ztmppromote(name) values('test insert in promoted server'); INSERT 0 1

Update Your qTest Application Configuration to Use the Standby Database

  1. Execute the qTestctl command to open the configuration wizard then update your PostgreSQL host.

    Refer to 2026.2 OnPremises Single Server Upgrade Guide for Linux for a refresher on using the Command Line Wizard to update your PostgreSQL Host. 

  2. Restart the qTest service.