Continuous WAL Archiving for Linux
This article includes step-by-step instructions for how to setup continuous write ahead log (WAL) archiving in an OnPremises Linux environment, with a large number of tests and test results. This incremental data backup process is beneficial for customers with a large amount of test and test results data.
The instructions below will only backup incremental changes from your last backup, which is quicker and more disk-space efficient than performing a full database backup.
To utilize continuous WAL archiving, you will:
-
Setup incremental database backup and perform a 'restore' from incremental backups if necessary.
-
Setup a 'read' replica database which will provide continuous backup and live fail-over capability.
You will need to be a PostgreSQL Superuser to perform the following actions.
Edit PostgreSQL
Edit postgresql.conf
-
Access your PostgreSQL database.
Example: /etc/postgresql/16.x/main/postgresql.conf
-
Archive to the backup folder /bck/pgarchive using the following commands:
Copy# The archive_mode must be set to on for archiving to happen. archive_mode = on # This is the command to invoke for each WAL file to be archived. archive_command = 'test ! -f /bck/pgarchive/%f && cp %p /bck/pgarchive/%f'
# Ensure there is at least one WAL file for each "archive_timeout" duration, it forces the server to switch to a new WAL segment file at least that often, archive_timeout settings of a minute or so are usually reasonable archive_timeout = 900
# Set high enough to leave at least one session available for the backup max_wal_senders = 3
# All WAL records required for the backup must contain sufficient full-page writes full_page_writes = on
# Keep around at least these many WAL files (aka segments). 16MB each; 0 disables, should be set high enough that the log is not removed before the end of the backup. wal_keep_segments = 64
# The WAL level must be archive or higher for continuous WAL archiving, hot_standby or higher for standby, possible values are minimal, archive, hot_standby, logical. wal_level = logical
Edit pg_hba.conf
To edit the PostgreSQL host replication, enter the following commands:
host replication PostgreSQL 127.0.0.1/32 md5 host replication PostgreSQL ::1/128 md5
Restart PostgreSQL Service
To restart the PostgreSQL service, use the following commands:
sudo systemctl status postgresql sudo systemctl stop postgresql sudo systemctl start postgresql
Base Backup
The base backup should be done weekly or daily based on the size of your database.
The directory that contains the backup may exist but it should be empty. You can set the name of the backup folder to the date of the backup and let the tool create it when back up occurs.
Use the following command and enter the appropriate information for your site:
pg_basebackup --host=<host> --port=<port, default 5432> --username=<user> --xlog --progress --pgdata=</path/to/datadir> Example: ubuntu@ip-10-0-102-246:~$ time sudo -u PostgreSQL pg_basebackup --host=localhost --port=5432 --username=PostgreSQL --xlog --progress --pgdata=/bck/pgbasebackup/bck1 Password: 413970/413970 kB (100%), 1/1 tablespace real 0m5.307s user 0m0.384s sys 0m0.516s
Cleanup Obsolete Archive Log Files
Complete this process after the base backup process runs successfully.
-
Use the following command to clean up the archive backup folder:
Copypg_archivecleanup -d <archive/location> <oldest_kept_wal_file>Example:
The example includes 3 base backups and has 3 files indicating the backup info:
Copy00000003000000050000004B.00000028.backup #bck1 00000003000000050000004D.00000028.backup #bck2 000000030000000500000051.00000028.backup #bck3 -
List all files in the archive backup folder:
CopyPostgreSQL@ip-10-0-102-246:/bck$ ls -l pgarchive/ total 163852 -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:29 000000030000000500000048 -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:29 000000030000000500000049 -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:31 00000003000000050000004A -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:31 00000003000000050000004B -rw------- 1 PostgreSQL PostgreSQL 305 Jun 29 08:31 00000003000000050000004B.00000028.backup -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:31 00000003000000050000004C -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:32 00000003000000050000004D -rw------- 1 PostgreSQL PostgreSQL 305 Jun 29 08:32 00000003000000050000004D.00000028.backup -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:37 00000003000000050000004E -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:42 00000003000000050000004F -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:44 000000030000000500000050 -rw------- 1 PostgreSQL PostgreSQL 16777216 Jun 29 08:44 000000030000000500000051 -rw------- 1 PostgreSQL PostgreSQL 305 Jun 29 08:44 000000030000000500000051.00000028.backup -
Delete the archive log files older than backup#2 (bck2):
CopyPostgreSQL@ip-10-0-102-246:/bck$ pg_archivecleanup -d /bck/pgarchive 00000003000000050000004D.00000028.backup pg_archivecleanup: keep WAL file "/bck/pgarchive/00000003000000050000004D" and later pg_archivecleanup: removing file "/bck/pgarchive/000000030000000500000049" pg_archivecleanup: removing file "/bck/pgarchive/00000003000000050000004C" pg_archivecleanup: removing file "/bck/pgarchive/000000030000000500000048" pg_archivecleanup: removing file "/bck/pgarchive/00000003000000050000004A" pg_archivecleanup: removing file "/bck/pgarchive/00000003000000050000004B"You will need to setup a maintenance job through Windows Scheduler, or manually perform the 'Base Backup' and 'Cleanup Obsolete Archive Log Files' steps daily or weekly depending on the size of your database.
Data Recovery
Stop Database Server
-
Check the status of your PostgreSQL database using the following command:
sudo systemctl status postgresql -
Stop the service using the following command:
sudo systemctl stop postgresql
Backup Current Database File
If you have the space to do so, it is recommended to copy the whole cluster data directory and any table spaces to a temporary location in case you need them later.
This precaution will require that you have enough free space on your system to hold two copies of your existing database. If there is not enough space, you should save the contents of the cluster's pg_xlog sub-directory, as it might contain logs which were not archived before the system went down.
Use the following command to create a new folder which will contain all current data files before recovery:
mv /var/lib/postgresql/9.6/main /var/lib/postgresql/9.6/main.before_recovery
Restore the Database Files from your File System Backup
Be sure that they are restored with the right ownership (the database system user, not root!) and with the right permissions. If you are using table spaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored.
Remove any files present in pg_xlog/ and copy all WAL segments (archive log files) into pg_xlog/
In this example, we will use the base backup #3 (bck3) to restore.
-
Copy from base backup, using the following command:
Copycp -rp /bck/pgbasebackup/bck3 /var/lib/postgresql/9.6/main -
Cleanup logs in pg_xlog, using the following command:
Copyrm -rf /var/lib/postgresql/9.6/main/pg_xlog/*
Create New recovery.conf File
There are two options for data recovery:
-
Restore to the most recent state, use only one parameter in recovery.conf file: restore_command: tells PostgreSQL how to retrieve archived WAL file segments.
-
Restore to the specific point in time, use two parameters in recovery.conf file: restore_command: tells PostgreSQL how to retrieve archived WAL file segments recovery_target_time: specifies the time stamp up to which recovery will proceed.
You can get a sample file from: c:\Program Files\PostgreSQL\9.6\share\recovery.conf.sample.
OR
Create a new empty file recovery.conf: /var/lib/postgresql/9.6/main/recovery.conf.
-
Save this content in that file to restore to a specific point in time, using the following command:
Copyrestore_command = 'cp /bck/pgarchive/%f %p' recovery_target_time = '2018-07-02 05:50:00.0' -
Or use only one parameter restore_command to restore to the most recent state, using the following command:
Copyrestore_command = 'cp /bck/pgarchive/%f %p'
Start the Database Server to Perform Recovery
Start the server, which will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery.
Upon completion of the recovery process, the server will rename recovery.conf to recovery.done (to prevent accidentally re-entering recovery mode later) and then commence normal database operations.
-
To start the database, use the following command:
Copysudo service postgresql start -
Review your log file 'PostgreSQL.log,' as seen in the example below:
-07-03 06:41:52 UTC [1634-852] LOG: restored log file "00000003000000080000009F" from archive 2018-07-03 06:41:52 UTC [1634-853] LOG: restored log file "0000000300000008000000A0" from archive 2018-07-03 06:41:52 UTC [1634-854] LOG: restored log file "0000000300000008000000A1" from archive 2018-07-03 06:41:53 UTC [1634-855] LOG: restored log file "0000000300000008000000A2" from archive 2018-07-03 06:41:53 UTC [1634-856] LOG: restored log file "0000000300000008000000A3" from archive 2018-07-03 06:41:53 UTC [1634-857] LOG: restored log file "0000000300000008000000A4" from archive 2018-07-03 06:41:53 UTC [1634-858] LOG: restored log file "0000000300000008000000A5" from archive 2018-07-03 06:41:54 UTC [1634-859] LOG: restored log file "0000000300000008000000A6" from archive 2018-07-03 06:41:54 UTC [1634-860] LOG: restored log file "0000000300000008000000A7" from archive 2018-07-03 06:41:54 UTC [1634-861] LOG: restored log file "0000000300000008000000A8" from archive 2018-07-03 06:41:54 UTC [1634-862] LOG: recovery stopping before commit of transaction 13924, time 2018-07-02 07:48:10.279609+00 2018-07-03 06:41:54 UTC [1634-863] LOG: redo done at 8/A801D758 2018-07-03 06:41:54 UTC [1634-864] LOG: last completed transaction was at log time 2018-07-02 05:26:23.275142+00 2018-07-03 06:41:54 UTC [1634-865] LOG: selected new timeline ID: 4 2018-07-03 06:41:55 UTC [1634-866] LOG: archive recovery complete 2018-07-03 06:41:55 UTC [1634-867] LOG: MultiXact member wraparound protections are now enabled 2018-07-03 06:41:55 UTC [1633-1] LOG: database system is ready to accept connections 2018-07-03 06:41:55 UTC [3538-1] LOG: autovacuum Launcher started
Setup Hot Replication
These instructions will use the example server information below:
-
Master Server IP: 10.0.102.246 and is configured using continuous WAL archiving as above.
-
Standby Server IP: 10.0.102.57
Master Server Configuration
-
Edit the pg_hba.conf to allow access from the standby server.
Example: host replication PostgreSQL 10.0.102.57/32 md5
-
Restart the Master Server.
Standby Server Configuration
-
Stop the database service, using the following command:
Copysudo systemctl stop postgresql -
Move the database folder to another location.
Example: mv /var/lib/postgresql/9.6/main /var/lib/postgresql/9.6/main_old
-
Stream data from the master server to the standby server, using the following command:
Copypg_basebackup -h <master_server> -p <port> -D <PostgreSQL/data/path> -U <user> -v -P --xlog-method=streamExample:
Copyubuntu@ip-10-0-102-57:~$ sudo -u PostgreSQL pg_basebackup -h 10.0.102.246 -D /var/lib/postgresql/9.6/main -U PostgreSQL -v -P --xlog-method=stream Password: pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed transaction log start point: 9/C2000028 on timeline 5 pg_basebackup: starting background WAL receiver 397557/397557 kB (100%), 1/1 tablespace transaction log end point: 9/C20000F8 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: base backup completed
Modify postgresql.conf
Use the following command to modify your postgresql.conf file:
hot_standby = on
Create the Recovery Configuration File recovery.conf
-
You can get a sample file from: /usr/share/postgresql/9.6/recovery.conf.sample OR
-
Create a new empty file recovery.conf: /var/lib/postgresql/9.6/main/recovery.conf
Specify the master server host with credentials in the parameter primary_conninfo, using the following command example:
standby_mode = on primary_conninfo = 'host=10.0.102.246 port=5432 user=PostgreSQL password=XXXXXX'
Start the Standby Server
-
Start the Standby Server, using the following command:
Copysudo systemctl start postgresql -
Review your log file 'PostgreSQL.log,' as seen in the example below:
2018-07-03 08:29:39.517 UTC [11849] LOG: received fast shutdown request 2018-07-03 08:29:39.517 UTC [11849] LOG: aborting any active transactions 2018-07-03 08:29:39.518 UTC [11854] LOG: autovacuum Launcher shutting down 2018-07-03 08:29:39.519 UTC [11851] LOG: shutting down 2018-07-03 08:29:39.534 UTC [11849] LOG: database system is shut down 2018-07-03 09:01:39.120 UTC [12142] LOG: database system was interrupted; last known up at 2018-07-03 08:49:20 UTC 2018-07-03 09:01:39.157 UTC [12142] LOG: entering standby mode 2018-07-03 09:01:39.164 UTC [12142] LOG: redo starts at 9/C2000028 2018-07-03 09:01:39.166 UTC [12142] LOG: consistent recovery state reached at 9/C20000F8 2018-07-03 09:01:39.166 UTC [12141] LOG: database system is ready to accept read only connections 2018-07-03 09:01:39.176 UTC [12146] LOG: started streaming WAL from primary at 9/C3000000 on timeline 5
Test the Hot Replication
-
On the Master Server, create a new table and insert one record, using the following command:
CopyqTestop=# create table ztmprep (name text); CREATE TABLE qTestop=# insert into ztmprep(name) values ('test replication'); INSERT 0 1 -
Query the record on the Standby Server you just inserted on the Master Server, using the following command:
CopyqTestop=# select * from ztmprep; name ------------------ test replication (1 row)The Standby Server is read-only, so no DML is allowed here. Therefore an error will be received. Example:
qTestop=# insert into ztmprep(name) values ('test insert in standby server'); ERROR: cannot execute INSERT in a read-only transaction
Promote Standby Server to Master Server
Use the following command to promote the Standby Server to the Master Server:
pg_ctl promote -D </path/to/data/folder>
Example:
$pg_ctl promote -D /var/lib/postgresql/9.6/main server promoting
Test Promoted Server
Test the Promoted Server with the following command:
testrep=# create table ztmppromote(name text); CREATE TABLE testrep=# insert into ztmppromote(name) values('test insert in promoted server'); INSERT 0 1
Update Your qTest Application Configuration to Use the Standby Database
-
Execute the qTestctl command to open the configuration wizard then update your PostgreSQL host.
Refer to 2026.2 OnPremises Single Server Upgrade Guide for Linux for a refresher on using the Command Line Wizard to update your PostgreSQL Host.
-
Restart the qTest service.