Incremental backups with rsync

February 27 2024

A cronjob on a server rsyncs all files from different locations to a single folder. Then, that folder gets versioned and backed up again, so that we can go back in time. Optionally, everything can be encrypted and sent to a cloud storage for aditional safety.

So how does it work? First of all some configurations:

Let's make a backup_config.sh

#!/bin/bash

BACKUPDISK=/home/backupdisk

#the backup folder
BACKUPDIR=$BACKUPDISK/backup

#folders by date and time
ARCHIVEDIR=$BACKUPDISK/archive

RSYNC_LOCK_FILE=/tmp/rsync.lock

#this script will populate the archive
INCREMENTAL=/home/s2/bin/backup/incremental.sh

#how much the disk can be full before deleting old archives
MAX_PERCENT_USED=85

Then we need to actually backup some stuff to $BACKUPDIR. The rsync_black.sh is the script that backs up my windows desktop pc (his name is black). The script connects with ssh to the remote pc, and backs up the

.gnupg,
.ssh,
AppData/Roaming/copyq,
AppData/Roaming/Mozilla/Firefox/Profiles

folders to $BACKUPDIR/black/Users/s2/:

#!/bin/bash
. `dirname $0`/backup_config.sh

host=black


(
flock -xn 200
if [ $? != 0 ]; then exit 0; fi;


ping -c1 $host>/dev/null 2>&1
if [ $? != 0 ]; then exit 0; fi

mkdir -p $BACKUPDIR/black

mkdir -p $BACKUPDIR/black/Users/s2/ && /home/s2/bin/backup/rsync-wrapper.sh \
    -R -aqz --numeric-ids --delete-after --ignore-missing-args --exclude '*.lock' \
    -e 'ssh' s2@$host:.gnupg \
    :.ssh \
    :AppData/Roaming/copyq \
    :AppData/Roaming/Mozilla/Firefox/Profiles \
    $BACKUPDIR/black/Users/s2/

$INCREMENTAL

) 200>$RSYNC_LOCK_FILE

The script uses rsync-wrapper.sh, because sometimes files on the running desktop vanish while being backed up, and we want to ignore this error. So we don't use rsync directly to backup, but wrap it with rsync-wrapper.sh:

#!/usr/bin/env bash

REAL_RSYNC=/usr/bin/rsync
IGNOREEXIT=24
IGNOREOUT='^(file has vanished: |rsync warning: some files vanished before they could be transferred)'

# If someone installs this as "rsync", make sure we don't affect a server run.
for arg in "${@}"; do
    if [[ "$arg" == --server ]]; then
	exec $REAL_RSYNC "${@}"
	exit $? # Not reached
    fi
done

set -o pipefail

# This filters stderr without merging it with stdout:
{ $REAL_RSYNC "${@}" 2>&1 1>&3 3>&- | grep -E -v "$IGNOREOUT"; ret=${PIPESTATUS[0]}; } 3>&1 1>&2

if [[ $ret == $IGNOREEXIT ]]; then
    ret=0
fi

exit $ret

Like our rsync_black.sh file above, we can create more files like that, to backup other computers. For example rsync_31337.it.sh backs up folders on a remote server:

#!/bin/bash
. `dirname $0`/backup_config.sh


(
flock -xn 200
if [ $? != 0 ]; then exit 0; fi;

mkdir -p $BACKUPDIR/31337.it

rsync -R -aqz --numeric-ids --delete-after -e 'ssh -p 22022' \
    --exclude /home/n3wz/n3wz/run/solr/solr-5.3.1/server/solr/fcku/data \
    root@vps.31337.it:/home/vmail \
    :/home/s2 \
    :/root \
    :/home/n3wz \
    :/etc/letsencrypt \
    :/etc/postfix \
    :/etc/postgresql \
    :/etc/postgresql-common \
    :/etc/apache2 \
    :/etc/news \
    :/etc/prosody \
    :/etc/dovecot \
    :/etc/default \
    :/etc/dkimkeys \
    :/etc/opendkim.conf \
    :/etc/opendmarc.conf \
    :/var/spool/cron/crontabs \
    :/var/www \
    $BACKUPDIR/31337.it/

$INCREMENTAL

) 200>$RSYNC_LOCK_FILE

Now we have all our folders from all our remote computers on $BACKUPDIR. We need to version them. $INCREMENTAL is responsible for that. This is incremental.sh:

#!/bin/bash
. `dirname $0`/backup_config.sh

CURBACKUPDATE=`date "+%F_%T"`
BACKUPSUBDIR=`date "+%Y/%m/%d"`

mkdir -p $ARCHIVEDIR/$BACKUPSUBDIR                             &&

nice -n 10 rsync -aq --inplace --numeric-ids \
           --link-dest=$ARCHIVEDIR/backup_current \
           $BACKUPDIR/ \
           $ARCHIVEDIR/$BACKUPSUBDIR/backup_$CURBACKUPDATE     &&
(cd $ARCHIVEDIR && rm backup_current && ln -sf $BACKUPSUBDIR/backup_$CURBACKUPDATE backup_current)

in $ARCHIVEDIR we create a lot of folders named with date and time, that contain our backup at a given point in time, so we can recover everything in case of lost data.

When our backupdisk reaches $MAX_PERCENT_USED, we need to clean up, by deleting old archives. To do that, we use remove_old.sh:

#!/bin/bash
. `dirname $0`/backup_config.sh


get_used_space() {
	local inodes=`df -i $BACKUPDISK|grep 'dev'|awk '{print $5}'|sed 's/%//'`
	local space=`df $BACKUPDISK|grep 'dev'|awk '{print $5}'|sed 's/%//'`

	if [ $inodes -gt $space ]; then
                used_space=$inodes
	else
		used_space=$space
	fi
	echo used space is $used_space
}

check_used_space() {
	get_used_space
	if [ $used_space -lt $MAX_PERCENT_USED ]; then
		echo used space is less than $MAX_PERCENT_USED

                echo empty directoies
                sudo find $ARCHIVEDIR/????/?? -depth -maxdepth 1 -empty -type d -exec rm -rf {} \;
                sudo find $ARCHIVEDIR/???? -depth -maxdepth 2 -empty -type d -exec rm -rf {} \;

		echo exiting
		exit 0
	fi
}


(
flock -xn 200
  if [ $? != 0 ]; then exit 0; fi;


  check_used_space

  for stuff in `ls -tr1d $ARCHIVEDIR/????/??/??`; do
	echo deleting $stuff
	sudo rm -rf $stuff
	check_used_space
  done

) 200>$RSYNC_LOCK_FILE

With all our files in one folder, we have everyhing in place. I have them in my home dir, in /home/s2/bin/backup:

/home/s2/bin/backup
|
├─ backup_config.sh
├─ incremental.sh
├─ remove_old.sh
├─ rsync_31337.it.sh
├─ rsync_black.sh
├─ rsync_home.sh
├─ rsync_laptop.sh
└─ rsync-wrapper.sh

So, now if we run rsync_31337.it.sh, all our files from the computer 31337.it get copied over to our $BACKUPDIR, and incremental.sh will then create our backup folder in archive, representing our files at this specific point in time.

$ARCHIVEDIR will look like this:

/home/backupdisk/archive/
├── 2023
│   ├── 04
│   │   ├── 23
│   │   │   ├── backup_2023-04-23_16:31:34
│   │   │   └── backup_2023-04-23_18:30:29
│   │   ├── 24
│   │   │   └── backup_2023-04-24_10:40:45

[...]

├── 2024
│   ├── 01
│   │   ├── 01
│   │   │   └── backup_2024-01-01_04:30:12
│   │   └── 29
│   │       └── backup_2024-01-29_04:32:43
│   └── 02
│       ├── 01
│       │   └── backup_2024-02-01_04:32:45
│       └── ...
└── backup_current -> 2024/02/27/backup_2024-02-27_04:30:19

Now we just need to run them periodically with cron:

#backup
30 4   * * * sudo /home/s2/bin/backup/rsync_home.sh
30 */2 * * * sudo /home/s2/bin/backup/rsync_black.sh

45 */2 * * * sudo /home/s2/bin/backup/remove_old.sh >/dev/null

Done.

Optionally, we could compress and crypt our $BACKUPDIR or $ARCHIVEDIR, and upload everything to some cloud storage for additional safety.