NOTE: This document is not finished yet. It is in the process of being updated for a LUG presentation sometime in the future. I am leaving it online as I work on it because I hope it will be useful even though there may be errors in it. The original (and now obsolete) version is available here: http://www.sanitarium.net/golug/rsync_backups.html
Backups using rsync
Written by Kevin Korb
as a presentation for GOLUG
[to be] Presented Sometime in 2010
This document is available at http://www.sanitarium.net/golug/rsync_backups_2010.html
- What is rsync?
Rsync is a program for synchronizing two directory trees across different filesystems even if they are on different computers. It can run its host to host communications over ssh to keep things secure and to provide key based authentication. Rsync also includes features that allow it to skip unnecessary parts of data. If a file is already present in the target and is the same as on the source the file will not be transmitted. If the file on the target is different than the one on the source then only the parts of it that are different are transferred. These features greatly increase the performance of rsync over the network.
- What are hard links?
Hard links are similar to symlinks. They are normally created using the ln command but without the -s switch. A hard link is when two file entries point to the same inode and disk blocks. Unlike symlinks there isn't a file and a pointer to the file but rather two links to the same file. If you delete either entry the other will remain and will still contain the data. Here is an example of both:
------- Symbolic Link Demo -------
% echo foo > x
% ln -s x y
% ls -li ?
38062 -rw-r--r-- 1 kmk users 4 Jul 25 14:28 x
38066 lrwxrwxrwx 1 kmk users 1 Jul 25 14:28 y -> x
# As you can see, y is only a pointer to x.
% grep . ?
x:foo
y:foo
# They contain the same data.
% rm x
% ls -li ?
38066 lrwxrwxrwx 1 kmk users 1 Jul 25 14:28 y -> x
% grep . ?
grep: y: No such file or directory
# Now that x is gone y is simply broken.
------- Hard Link Demo -------
% echo foo > x
% ln x y
% ls -li ?
38062 -rw-r--r-- 2 kmk users 4 Jul 25 14:28 x
38062 -rw-r--r-- 2 kmk users 4 Jul 25 14:28 y
# They are the same file occupying the same disk space.
% grep . ?
x:foo
y:foo
% rm x
% ls -li ?
38062 -rw-r--r-- 1 kmk users 4 Jul 25 14:28 y
% grep . ?
y:foo
# Now y is simply an ordinary file.
------- Breaking a Hard Link -------
% echo foo > x
% ln x y
% ls -li ?
38062 -rw-r--r-- 2 kmk users 4 Jul 25 14:34 x
38062 -rw-r--r-- 2 kmk users 4 Jul 25 14:34 y
% grep . ?
x:foo
y:foo
% unlink y ; echo bar > y
% ls -li ?
38062 -rw-r--r-- 1 kmk users 4 Jul 25 14:34 x
38066 -rw-r--r-- 1 kmk users 4 Jul 25 14:34 y
% grep . ?
x:foo
y:bar
Why backup with rsync instead of something else?
- Disk based: Rsync is a disk based backup system. It doesn't use tapes which are too slow to backup modern systems with large hard drives.
- Fast: Rsync only backs up what has changed since the last backup. It NEVER has to repeat the full backup unlike most other systems that have monthly/weekly/daily differential configurations.
- Less work for the backup client: Most of the work in rsync backups including the rotation process is done on the backup server which is usually dedicated to doing backups. This means that the client system being backed up is not hit with as much load as with some other backup programs. The load can also be tailored to your particular needs through several rsync options and backup system design decisions.
- Fastest restores possible: If you just need to restore a single file or set of files it is as simple as a cp or scp command. Restoring an entire filesystem is just a reverse of the backup procedure. Restoring an entire system is a bit long but is less work than backup systems that require you to reinstall your OS first and about the same as other manual backup systems like dump or tar.
- Only one restore needed: Even though each backup is an incremental they are all accessible as full backups. This means you only restore the backup you want instead of restoring a full and an incremental or a monthly followed by a weekly followed by a daily.
- Cross Platform: Rsync can backup and recover anything that can run rsync. I have used it to backup Linux, Windows, DOS, OpenBSD, Solaris, and even ancient SunOS 4 systems. The only limitation is that the filesystem that the backups are stored on must support all of the file metadata that the filesystems containing files to be backed up supports. In other words if you were to use a vfat filesystem for your backups you would not be able to preserve file ownership when backing up an ext3 filesystem. If this is a problem for you try looking into rdiff-backup.
- Cheap: It doesn't seem like it would be cheap to have enough disk space for 2 copies of everything and then some but it is. With tape drives you have to choose between a cheap drive with expensive tapes or an expensive drive with cheap tapes. In a hard drive based system you just buy cheap hard drives and use RAID to tie them together. My current backup server uses two 500GB IDE drives in a software RAID0 configuration for a total of 1TB for about $100 which is about 1/6th what I paid for the DDS3 tape drive that I used to use and that doesn't even include the tapes that cost about $10/12GB.
- Internet: Since rsync can run over ssh and only transfers what has changed it is perfect for backing up things across the internet if you need to do so. This is perfect for backing up a web site at a web hosting company or even a colocated server. It is also useful for backing up to the internet using various services.
- Do-it-yourself: There are FOSS backup packages out now that use rsync as their back end but the nice thing here is that you are using standard command line tools (rsync, ssh, rm) so you can engineer your own backup system that will do EXACTLY what you want and you don't need a special tool to restore.
Why/When wouldn't you want to use rsync for backups?
- Databases: Rsync is a file level backup so it is not suitable for databases. If your primary data is databases then you should look somewhere else. If you have databases but they are not your primary data then there is a procedure below to integrate a database backup into the rsync backups.
- Windows: If you plan to backup windows boxes then rsync probably isn't for you. It is possible to backup Windows boxes with rsync but the system recovery process is UGLY and if you want a complete backup of the OS you will have to boot the computer into Linux to be able to read some of the files.
- Compression: Since rsync doesn't put the files into any kind of archive there is no compression at all. In most cases it is still more cost effective to store uncompressed data on a hard drive than it is to store compressed data on a tape or some other media but this might not be true for everyone. Also, most modern file formats are already compressed so in many cases the compression wouldn't help anyways.
- Commercial support: Like most of the stuff I talk about there is no real commercial support for this. If you want a backup software vendor that you can call and beg for help from then go buy some big commercial backup system but expect to pay a ton of money for something that isn't anywhere near as flexible as rsync.
- Security: Since rsync runs over ssh you would normally set it up so that root on your backup server can ssh into all of your other machines as root without a password. This means that the security of your backup server becomes very important as anyone who roots it can root any other server with one command. There are ways that you could design around this or you could simply require the person running the backup to type in the root passwords as it goes but those solutions all over-complicate things. Giving your backup server all of the keys isn't really as bad as it sounds though when you consider that in any other backup system the backup server would still have some kind of root access to the other servers as well as a complete copy of them that a hacker could use to break in. Note that it is possible to restrict the ssh key used by the backups to only work from the backup server using the from= parameter in the authorized_keys file.
- Do-it-yourself: This is still a do-it-yourself system. You have to decide how you want your backups to work and how you want them organized. If you don't want to write/modify shell scripts then look for something else or look at the available backup systems that use rsync as their back end.
Why not just use RAID / Is this like using RAID-1? / Is this like DRBD?
I don't think I can ever say this enough times.... RAID is NOT a backup system! RAID (other than level 0) does a wonderful job of protecting your data from disk failures. However, it provides absolutely NO protection against file corruption, files destroyed by a virus or a hacker, or the "oops, I deleted the wrong file" problem which most of us have encountered. There is a time and a place for RAID and RAID is not always needed however data should ALWAYS be backed up regardless of what media it is stored on or how redundant that media may be. Networked mirroing solutions like DRBD have the same drawbacks as RAID as they are a simple realtime mirror. My general rule of thumb is that if you can't restore your data to the way it was last monday using a storage device other than the one the data was on last monday then you don't have a backup system.
Do I need to backup the OS or just the data?
In my opinion yes, you need to backup the OS as well as the data. Many people feel that the OS is easily recreated by doing a re-install plus loading a list of applications that was saved during the backup run. While this is true in theory it isn't so easy in practice. If you ever have the catastrophic loss of a server you will find out very quickly that every minute counts. If you have a backup of the OS and an established and practiced procedure for restoring it the recovery will go very quickly and it will probably work the first time. If your recovery procedure includes "install the OS" and "install all the applications" expect to add a full day of listening to users complain while you do those steps. Also, in terms of gigabytes, the OS is usually tiny compared to the data it supports. The extra disk space required to backup the OS will probably not even make a difference in the choice of how big to make the backup system. With the typical ratio of OS vs data it is just silly to not backup the OS.
Why all this talk of a backup server? Why not just use an external hard drive?
While it is completely possible to do the backups this way (and I have done it this way myself) there are a couple of drawbacks...
- Security: One of the reasons we have backups is because of the possibility of malicious activity (hackers, worms, trojans, etc). If your backup device is plugged into a computer being backed up then any malicious activity that can destroy your data can also destroy your backups. Keeping your backups on a separate isolated server protects them from this possibility.
- Performance: Rsync's ability to transfer only the parts of a file that have changed does not work on local transfers. This is because the feature would actually be counter-productive on a local transfer. Rsync would have to read and hash both versions of the file then write out the new version of the file instead of simply reading from the source and writing to the target. Also, most external hard drives are USB which is a pretty slow interface. Note that this is also true if the source or destination path is a network mount point instead of a network transport to a remote filesystem. Read the rsync man page section on --whole-file for more information.
What about a Network Attached Storage (NAS) device instead of a server?
How do you do offsite/offline backups with rsync?
The best way to do an offsite or offline backup is to do the rsync backup like normal and then backup the backup to tape or whatever media you want to use for your offline/offsite backups. This gives you all the speed advantages of rsync during the actual backups and restores while allowing you to do the slower tape backups during the day when the backup server would otherwise be idle. Note that I do not recommend using removable hard drives for offsite rsync backups. Hard drives have very fragile moving parts and if you are constantly transporting them around they will not last long and will probably fail when you need them most as that is when they will be transported.
How do you handle databases?
Databases can't just be backed up like files. This is because database engines are constantly making changes to the database files at the block level. If you backed them up with a file based tool like rsync the backup would be inconsistent and possibly even unusable. The best way to backup most databases is to take an LVM snapshot of the database then rsync backup the snapshot of the database. This allows you to have all the advantages of an rsync backup with as little impact to the running database as possible. If you can't use LVM snapshots then your next best bet is to use the database specific tools to dump the database contents to files that can be backed up. If all else fails you can lock or shutdown the database engine so the files are not changing during the backup but this will be a huge impact outage.
How much space does it take to do rsync backups while keeping old copies?
This completely depends on how much change there is between each backup and how many backups you retain. I have seen it as low as 5% and as high as 40% but it is completely dependant on your data and your retention policy.
Organizing backups
Since this is a do-it-yourself system this is totally up to you to design. I have my backup storage mounted under /backup and put all of my rsync backups under /backup/rsync. Within that directory I make a directory for each host that gets backed up. Then for each backup of each file system I change '/' to '_' in the mount point name and time stamp the filesystem so my backup of /home/asylum done at 17:47 on 2005-07-25 would be stored in /backup/rsync/asylum/_home_asylum.2005-07-25.17-47-42. When the backup is done I would create a symlink from that directory to /backup/rsync/asylum/_home_asylum.current to make it easier to find especially from scripts.
Rotating backups
Rsync does the incremental backups using "hard links" and the --link-dest parameter. However, it has no mechanism for purging old backups when they reach a predefined age. The purging is usually done with a simple rm -rf of the oldest backup(s) as needed.
Here is how the organization with the hard links looks:
You can determine the current backup with:
# readlink _home_asylum.current
_home_asylum.2005-07-25.15-32-42
Here is an example of 10 backups of my home directory:
# du -shc _home_asylum.2*
9.7G _home_asylum.2005-06-21.15-29-25
161M _home_asylum.2005-06-22.20-12-01
207M _home_asylum.2005-06-30.18-36-21
125M _home_asylum.2005-07-01.12-15-05
173M _home_asylum.2005-07-05.11-05-34
181M _home_asylum.2005-07-07.13-43-22
176M _home_asylum.2005-07-07.17-22-09
234M _home_asylum.2005-07-13.11-14-32
160M _home_asylum.2005-07-18.16-32-54
168M _home_asylum.2005-07-25.15-32-42
12G total
# foreach f (_home_asylum.2*)
foreach? du -sh $f
foreach? end
9.7G _home_asylum.2005-06-21.15-29-25
9.7G _home_asylum.2005-06-22.20-12-01
9.7G _home_asylum.2005-06-30.18-36-21
9.7G _home_asylum.2005-07-01.12-15-05
9.7G _home_asylum.2005-07-05.11-05-34
9.8G _home_asylum.2005-07-07.13-43-22
9.8G _home_asylum.2005-07-07.17-22-09
9.8G _home_asylum.2005-07-13.11-14-32
9.7G _home_asylum.2005-07-18.16-32-54
9.8G _home_asylum.2005-07-25.15-32-42
Note that each backup individually is complete but when taken together there is only a small increase in disk usage. This concept is the key of rsync incremental backups.
Actually backing up
Now we get to actually look at rsync. When you run rsync you will tell it to backup the live filesystem into a new empty directory and to look to the previous backup for files that have already been backed up. Whenever rsync finds a new file it will copy over that file. Whenever it finds a modified file it will copy over the differences making a new file in the new backup directory but leaving the old version of the file as it was in the old backup directory. When rsync finds a file that has not changed since the last backup it will simply be hard linked into the new backup directory requiring almost no additional disk space. There is a wide variety of options that can be used with rsync to tailor it to your specific needs but here is what I would usually use:
rsync -vaxH --progress --numeric-ids --delete \
--exclude-from=asylum_backup.excludes --delete-excluded \
--link-dest=/backup/rsync/asylum/_home_asylum.2005-07-25.15-32-42
root@asylum:/home/asylum/ /backup/rsync/asylum/_home_asylum.[current date]/
Now I will explain the components of that rather long command...
- rsync: Duh, the rsync command ;)
- -v: Verbose. This causes rsync to list each file that it touches. I would leave this out if running from cron.
- -a: Archive. This causes rsync to maintain things like file permissions, ownerships, and timestamps.
- -H: Hard Links. This causes rsync to maintain hard links that are on the server being backed up. This has nothing to do with the hard links used during the rotation.
- -x: One File System. This causes rsync to NOT recurse into other filesystems. If you use this like I do then you must backup each filesystem (mount point) one at a time. The alternative is to simply backup / and exclude things you don't want to backup (like /proc, /sys, /tmp, and any network or removable media mounts)
- --progress: This adds to the -v and tells rsync to print out a %completion and transfer speed while transferring large files. I would definitely leave this out when running from cron!
- --numeric-ids: This tells rsync to not attempt to translate UID <> userid or GID <> groupid. This is very important when doing backups and restores. If you are doing a restore from a live cd such as Knoppix your file ownerships will be completely screwed up if you leave this out.
- --delete: This tells rsync to delete files that are no longer on the server from the backup. If disk space is tight you may want to add --delete-before to force rsync to delete files before adding anything new however this will decrease performance as rsync will have to index the entire filesystem on both ends looking for things to delete before it can start transferring anything.
- --exclude-from=asylum_backup.excludes: This is a plain text file with a list of paths that I do not want backed up on this particular server. The format of the file is simply one path per line. I tend to add things that will always be changing but are unimportant such as unimportant log files. If you have a ~/.gvfs entry you should add it too as it will cause a non-fatal error.
- --delete-excluded: This tells rsync that it can delete stuff from a previous backup that is now within the excluded list.
- root@: This is the userid given to rsync which it will then use to ssh to the server getting backed up.
- asylum:: This is the hostname that rsync will ssh to.
- /home/asylum/: This is the path on the server that is to be backed up. Note that the trailing slash IS significant.
- --link-dest=/backup/rsync/asylum/_home_asylum.2005-07-25.15-32-42: This is the most recent complete backup that was current when we started. We are telling rsync to link to this backup for any files that have not changed.
- /backup/rsync/asylum/_home_asylum.[current date]/: This is the empty directory we are going to backup to. It should be created with mkdir -p first. Note that the trailing slash is significant here as well.
There is also an environment variable that rsync uses to determine what command to use for its network communications. Here is the variable that I use:
RSYNC_RSH "ssh -c arcfour -o Compression=no -x"
Now I will explain the components of that variable..
- ssh: use ssh instead of the default of rsh.
- -c arcfour: Uses the weakest but fastest encryption that ssh supports
- -o Compression=no: Turns off ssh's compression. Rsync has its own if you want it.
- -x: Turns off ssh's X tunneling feature (if you actually have it on by default)
Recovering files from backups
Because rsync doesn't put the backed up files into any kind of archive this is as simple as copying a file. Just find the file you need on the backup server and copy it to where you need it to be. If you are restoring it to another server just use rsync or scp to get it there. Here are 2 examples of files that can be restored from my home directory:
# ls -li _home_asylum.2*/kmk/bin/encode
3605946 5 kmk users 2223 Jul 2 11:34 _home_asylum.2005-07-05.11-05-34/kmk/bin/encode
3605946 5 kmk users 2223 Jul 2 11:34 _home_asylum.2005-07-07.13-43-22/kmk/bin/encode
3605946 5 kmk users 2223 Jul 2 11:34 _home_asylum.2005-07-07.17-22-09/kmk/bin/encode
3605946 5 kmk users 2223 Jul 2 11:34 _home_asylum.2005-07-13.11-14-32/kmk/bin/encode
3605946 5 kmk users 2223 Jul 2 11:34 _home_asylum.2005-07-18.16-32-54/kmk/bin/encode
4853134 1 kmk users 4012 Jul 21 19:31 _home_asylum.2005-07-25.15-32-42/kmk/bin/encode
# ls -li _home_asylum.2*/kmk/bin/mp3db
4074469 1 kmk users 29598 Jun 19 16:01 _home_asylum.2005-06-21.15-29-25/kmk/bin/mp3db
4082467 1 kmk users 29943 Jun 22 19:10 _home_asylum.2005-06-22.20-12-01/kmk/bin/mp3db
4124342 1 kmk users 30570 Jun 30 17:22 _home_asylum.2005-06-30.18-36-21/kmk/bin/mp3db
2617551 1 kmk users 30701 Jul 1 12:17 _home_asylum.2005-07-01.12-15-05/kmk/bin/mp3db
3605948 1 kmk users 35604 Jul 1 16:50 _home_asylum.2005-07-05.11-05-34/kmk/bin/mp3db
4411207 2 kmk users 35668 Jul 6 11:06 _home_asylum.2005-07-07.13-43-22/kmk/bin/mp3db
4411207 2 kmk users 35668 Jul 6 11:06 _home_asylum.2005-07-07.17-22-09/kmk/bin/mp3db
4523360 1 kmk users 37041 Jul 9 17:28 _home_asylum.2005-07-13.11-14-32/kmk/bin/mp3db
4675812 1 kmk users 37201 Jul 18 09:50 _home_asylum.2005-07-18.16-32-54/kmk/bin/mp3db
4853138 1 kmk users 37200 Jul 19 16:46 _home_asylum.2005-07-25.15-32-42/kmk/bin/mp3db
As you can see my encode script has been fairly constant while my mp3db script has changed almost every time I have run a backup. I can choose to restore whichever version I want as they are all just plain files.
Recovering entire filesystems from backups
This is a simple reverse of the backup procedure. Just format the new filesystem and rsync the files back to it and make sure you use the same rsync options especially -a and --numeric-ids.
Recovering entire systems from backups
This is where things get a little ugly. Of course this is for times that are already ugly because you probably just lost your boot drive and have a brand new one installed that is completely blank. This procedure varies a bit depending on what OS you are restoring but here is the general idea:
- Boot from some media that gives you an OS, networking, rsync, and ssh. Knoppix can do the job for Linux systems. In the case of OpenBSD I use their install disc and then use ftp to transfer a tarball the rsync backup instead of using rsync. The same thing will work in Solaris although it is usually easier to NFS mount the backup repository. Of course if you are restoring the backup server you don't need any of these tools as a simple cp will do.
- Partition the new drive with fdisk or whatever you usually use. If you follow my advice in the advanced section you will have an .sfdisk file and you can duplicate the original partition table with 'sfdisk /dev/whatever < file.sfdisk'.
- Format the new partitions. Linux choices are mke2fs, mkfs.ext4, mkfs.xfs, and mkswap. For most other operating systems it is simply newfs.
- Mount up the new partitions in a convenient location with something like:
mkdir /s
mount -vt [fstype] /dev/[root partition] /s
mkdir /s/usr /s/var /s/proc /s/dev /s/tmp
chmod 1777 /s/tmp
mount -vt [fstype] /dev/[var partition] /s/var
mount -vt [fstype] /dev/[usr partition] /s/usr
- Now run your filesystem level restores just like you would if you weren't recovering the entire system. You will need to restore each filesystem that was on the old boot disk.
- If you have made any changes such as device names, mount points, or partition layouts you should now update /s/etc/fstab and /s/boot/grub/menu.lst.
- Fix up /dev if needed
- Now you have to make the disk bootable again. This totally varies by operating system and boot loader...
For Linux systems using grub:
- grub-install --root-directory=/s /dev/sdwhatever
Or, if that doesn't work:
- mount -vo bind /dev /s/dev
- mount -vo bind /proc /s/proc
- chroot /s /bin/bash
- grub
- root (hd0,0) - or whatever partition matches your boot disk
- setup (hd0)
- exit
- exit
For Linux systems using lilo (why are you still using lilo?):
- mount -vo bind /dev /s/dev
- mount -vo bind /proc /s/proc
- chroot /s /bin/bash
- lilo -v
- exit
For OpenBSD systems:
- cd /s/usr/mdec
- ./installboot /s/boot ./biosboot /dev/rwd0c (or /dev/rsd0c if using SCSI)
For Solaris systems:
- installboot /s/usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t0d0s0
Advanced topics
- Format of backup repository: Assuming you are using a Linux box as your backup server you have multiple choices for the filesystem type that you want to format the backup drive with. I generally use ext4 because it is the fastest well established filesystem available currently and it does a good job as long as there aren't too many files for fsck to handle in a reasonable amount of time. However, xfs is also a good choice because it is better at dealing with large files and it is much better at doing the delete portion of the backup rotation. XFS also eliminates the need for the occasional offline fsck which may make it your only choice if you have many millions of files to deal with. You may want to play with these 2 choices a bit before you make your final decision. I do not reccommend using JFS as it has horrible performance or reiserfs as it has horrible reliability.
- ZFS: If you have many millions of files to deal with you may discover that this system simply takes too long to delete old backups and if you ever need an fsck you may even be down for days waiting for it to finish. ZFS on OpenSolaris is the answer to your prayers. ZFS can handle multiple LVM-like snapshots. The benifit here is that you can run rsync backups without the --link-dest parameter and simply overwrite the previous backup each run. Then you use the ZFS snapshots to retain the old backups. Each old backup because a snapshot backup. The snapshots are created and deleted in less than a second removing the need for the long rm operations to purge old backups and allowing rsync to just sync files without bothering to create hard links. Hopefully soon btrfs will give us this capability in Linux but until then ZFS on OpenSolaris is the thing that completes large scale rsync backups.
- RAIDed backup repository: IMHO opinion RAID redundancy is not needed on the backups because they are an extra copy of the data anyways. My backup drive is a 1TB RAID-0 made up of 2 500GB IDE drives and the RAID-0 provides no redundancy at all. If you are extra paranoid and you want redundancy through RAID-1 or RAID-5 then you can add it but most people will find that it isn't needed. If you use RAID-0 or RAID-5 you should set your strip size small because most of the disk work is done at the filesystem metadata level not the file level so that is where you want your speed boost. If you must have redundancy on your backups I would recommend using RAID-1 or RAID-10 instead of RAID-5 for performance reasons but you can probably live with RAID-5 if your controller is fast enough.
- Separating the rotation process from the backups: If your backup window is tight you can separate out the components of the backup and the rotation. You can run your backup at night during the short window and you can delete the old backups during the day when the backup server would otherwise be idle since it doesn't affect the other servers.
- Cross-platform handling of /dev and other device files: Since different operating systems handle major and minor numbers differently I suggest excluding /dev from the rsync backups. I keep a /dev.tar tarball on all of my boxes with a backup of /dev in it just in case I ever need to restore that. The tarball will be very small since there are no actual data in it. Note that this is completely unimportant on Linux systems that use udev for /dev.
- What is different between 2 backups: I wrote a perl script that scans 2 backups of the same directory and lists what has changed between them. I have published that script at http://www.sanitarium.net/unix_stuff/backups/diff_backup.pl.txt
- Storing data that isn't kept in a file: I wrote a perl script that does backups of data that isn't stored in files such as partition tables. My main backup script runs this "getinfo" script whenever it backs up a root filesystem. The script is published at http://www.sanitarium.net/unix_stuff/backups/getinfo.pl.txt. I also have Linux and OpenBSD examples of its tab files published at http://www.sanitarium.net/unix_stuff/backups/asylum_backup.getinfo.tab.txt and http://www.sanitarium.net/unix_stuff/backups/hellmouth_backup.getinfo.tab.txt
- rsync -n / --dry-run: This is rsync's "dry run mode". You can use this on any other rsync command to have rsync tell you what it would have done without the -n parameter without actually doing anything.
- rsync -i / --itemize-changes: This tells rsync to display why it thinks that a file needs to be transferred. If you suspect that rsync is transferring files it shouldn't need to this can tell you why. Note that it does NOT imply -n.
- rsync -W / --whole-files: This tells rsync to transfer entire files instead of using its block level comparison system. If you have a nice fast link (like a LAN) this can make things faster since rsync doesn't have to checksum files at all but if you are transferring across the internet you don't want this.
- rsync -c / --checksum: This tells rsync to checksum all files. Normally rsync compares the timestamp and the size of a file to determine if it has changed since the last backup. If you use -c rsync will checksum ALL files which will take a long time. You wouldn't normally use this option however it is good to have if you believe your data has become corrupted in a way that doesn't affect the information you see in an ls -l output.
- rsync -S / --sparse: This tells rsync to turn files with large chunks of null characters into sparse files as it transfers them.
- rsync --delete-*: There are several options that control when rsync does the deletion process. Normally you would just use --delete and let rsync use the fastest one available in your version of rsync however there are times when you want to force it to behave differently. As of version 3.0.6 --delete-durring is the default for --delete. See the man page for more information.
- rsync -T / --temp-dir: If you have a tmpfs mount you can get a very small speed boost by using this parameter. It causes the partial files used during the block level transfers to be stored in an alternate (faster) location until the file is complete. This will only help if you are doing block level transfers and if the directory you specify is on a tmpfs mount. Note that your tmpfs mount must be big enough to hold any single file or it will cause rsync to fail with an insufficient disk space error. Also, if your tmpfs mount goes into swap you will completely kill your performance. IOW, don't use this unless you are sure it is going to help.
- rsync --bwlimit: Allows you to limit how much bandwidth rsync uses in its network communications. It is measured in KB per second.
- rsync --ignore-errors: This overrides one of rsync's built in safety features. Normally if there is a problem during the backup rsync will NOT run its delete pass. If you use --ignore-errors the delete pass will run regardless of any other errors. Note that this isn't as dangerous as it sounds since you still have older backups.
- rsync --max-delete: This allows you to reimplement the safety feature above with a threshold. You can tell rsync how many files it can delete before it decides that something must be wrong and stops.
- rsync -z / --compress: This tells rsync to use zlib compression on its communications. This would be good if you are backing up over the internet but it is usually counter productive on a LAN.
- rsync -A / --acls: This tells rsync to preserve ACLs in addition to permissions.
- rsync -X / --xattr: This tells rsync to preserve extended attributes in addition to permissions.
- push instead of pull: Rsync can push data just as well as it can pull it. It is possible to have all servers push their backups to the backup server instead of the backup server pulling the data from them. I personally don't like this approach because it means that all your servers have the key to your backup server instead of the other way around and because you have to engineer a much more complicated way of doing the rotations as well as making sure you don't have 10 servers trying to back themselves up and once which would flood the backup server.
- Buddy backups: If you don't want to dedicate a box to running backups you could pair off your boxes and have them backup each other. You could also do this in a ring layout.
- LVM Snapshots: It is also possible to use an LVM to take an instant shot of a filesystem and then backup that snapshot. This would remove any chance of a filesystem changing during the backup.
- Squashfs for archives: If you want to make a permanent archive of a particular backup (perhaps to burn it) squashfs is a great way to do it. Squashfs creates a compressed mountable archive of a directory tree. You create a squashfs archive with mksquashfs which works much like mkisofs and then you can mount the resulting file as a loopback device.
Helpful links