Streaming only PostgreSQL replication

My production PostgreSQL database is chugging along fine, but I have a bad feeling in my gut: if the box hosting that db goes down, I lose all my data. I make daily backups, but potentially losing 24 hours of data is much.

Time to set up a replicated read only server on an other box, that receives all the data from the master server asynchronously in real time.

Preparing the master server

First create a user for the slave db:

1
sudo -u postgres psql -c "CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'x';"

Next we need to grant replication access to the replicator user: in pg_hba.conf add

1
2
3
4
5
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication replicator peer
host replication replicator 127.0.0.1/32 md5
host replication replicator ::1/128 md5

in postgresql.conf set

1
2
3
4
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 10
max_replication_slots = 10

wal_keep_segments = 10 keeps 160MB of data before removing it on the master. If you think your database writes more than that in the time it takes to sync over the network to the slave, increase it.

Then we create a replication slot for our slave to use:

1
sudo -u postgres psql -x -c "SELECT * FROM pg_create_physical_replication_slot('replication_slot_1');"

(If you want to remove the slot, the command is select pg_drop_replication_slot('replication_slot_1');)

The master is ready. We just need to restart it:

1
sudo service postgresql restart

We are ready to get connections from slaves.

Setting up a slave

Since on the master we only accept connections from localhost (I don't like to expose the postgres port on the internet), we set up an ssh tunnel from the slave to the master:

1
while true; do (echo JUST TUNNELING && ssh -L5433:127.0.0.1:5432 -N user@master-pg-host); date; done;

or make a systemd service or something. I like to have a script like this always running in my screen.

Then, make the slave pg a slave by editing postgres.conf:

1
2
3
4
wal_level = hot_standby
max_wal_senders = 3
wal_keep_segments = 10
hot_standby = on

Now we need to

  • create a backup from the master
  • import it on the slave
  • start the slave.

To do that, we can use this script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
echo Stopping PostgreSQL
sudo service postgresql stop

echo Cleaning up old cluster directory
sudo -u postgres rm -rf /var/lib/postgresql/9.5/main

echo Starting base backup as replicator
sudo -u postgres PGPASSWORD=x pg_basebackup -h 127.0.0.1 -p 5433 -D /var/lib/postgresql/9.5/main -U postgres -v -P

echo Writing recovery.conf file
sudo -u postgres bash -c "cat > /var/lib/postgresql/9.5/main/recovery.conf <<- _EOF_
standby_mode = 'on'
primary_conninfo = 'host=127.0.0.1 port=5433 user=replicator password=x'
primary_slot_name = 'replication_slot_1'
_EOF_
"

echo Startging PostgreSQL
sudo service postgresql start

When the slave restarts, it will sync with the lost transactions on the master and you have a running live backup!

Checking if it works

On the master, execute

1
sudo -u postgres psql -x -c "select * from pg_stat_replication;"

you should get an output like

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-[ RECORD 1 ]----+------------------------------
pid | 27420
usesysid | 10
usename | postgres
application_name | walreceiver
client_addr | 127.0.0.1
client_hostname |
client_port | 57522
backend_start | 2017-10-15 23:26:31.838917+02
backend_xmin |
state | streaming
sent_location | 0/939DFFC8
write_location | 0/939DFFC8
flush_location | 0/939DFFC8
replay_location | 0/939DFFC8
sync_priority | 0
sync_state | async

You can also check if the replication slot is used on the master:

1
2
3
4
5
6
=> SELECT slot_name, slot_type, active, xmin, catalog_xmin, restart_lsn FROM pg_replication_slots;
slot_name | slot_type | active | xmin | catalog_xmin | restart_lsn
--------------------+-----------+--------+------+--------------+-------------
replication_slot_1 | physical | t | | | 17/2D8EC6A0
(1 row)


If it is active it is working.

disable and enable replication from the slave

Sometimes when running heavy queries on the slave, it may become necessary to disable replication for the duration of the query.

postgres < 10

1
2
3
select pg_xlog_replay_pause();
--run query
select pg_xlog_replay_resume();

postgres >= 10

1
2
3
select pg_wal_replay_pause();
--run query
select pg_wal_replay_resume();

avatar

Simons stuff