I’ve been using Amazon’s S3 service for a couple months now. It was working OK using s3sync and a cron job, but it seemed like it wasn’t actually making incremental backups and I wasn’t 100% sure that it was backing up everything (i.e. it appeared to be crapping out once in a while). I searched around for various S3 backup solutions and found a handy utility called duplicity. Even more handy that it is available for most distributions (Archlinux, the debs, and Fedora anyway).
From the duplicity home page:
Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.
What you’ll need
You’ll need to make sure you have a few things installed before you install duplicity. Namely librsync and GnuPG. Luckily, if the duplicity package is available for your distribution, you probably needn’t worry.
Here’s a rundown of the steps involved:
- Generate a new GnuPG key
- Create a simple shell script wrapper
- Create a cron job
Generating a new Key
Start by generating a new gpg key for duplicity. Or if you have an existing one, you can use that.
N.B. I set this up on a Slice running Arch64 and had problems generating a new key (gpg --gen-key). Apparently, it could not generate enough entropy. Not a problem though: Just generate the keys else where and import them later if this happens to you.
#~ gpg --gen-key
gpg (GnuPG) 1.4.7; Copyright (C) 2006 Free Software Foundation, Inc.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions. See the file COPYING for details.
Please select what kind of key you want:
(1) DSA and Elgamal (default)
(2) DSA (sign only)
(5) RSA (sign only)
Your selection?
Default (DSA and Elgamal) is fine here.
DSA keypair will have 1024 bits.
ELG-E keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048)
The default (2048) is more than enough for this. Change it to whatever you want.
Requested keysize is 2048 bits
Please specify how long the key should be valid.
0 = key does not expire
<n> = key expires in n days
<n>w = key expires in n weeks
<n>m = key expires in n months
<n>y = key expires in n years
Key is valid for? (0)
Unless you want the key to expire (I don’t see why one would want that), the default is what we want.
Key does not expire at all
Is this correct? (y/N)
Um, yes, this is correct.
You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
"Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"
Real name: DuplicityBackup
Email address: duplicity@mydomain.com
Comment: Key for Duplicity
You selected this USER-ID:
"DuplicityBackup (Key for Duplicity) <duplicity@mydomain.com>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit?
Enter whatever information you want here and type ‘O’ for ‘Okay’
You need a Passphrase to protect your secret key.
Enter Passphrase:
Enter something. Anything. The more complex the better. This is your private data. Remember that it’s being transfered over http to a server you don’t own. I don’t care if it is Amazon. Remember what you type because you’ll need it later while creating the wrapper script.
gpg: key **9929DAB1** marked as ultimately trusted
public and secret key created and signed.
gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0 valid: 2 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 2u
pub 1024D/9929DAB1 2007-11-15
Key fingerprint = 3378 8E93 4349 0E7F 44F3 7C81 2460 5A11 9929 DAB1
uid DuplicityBackup (Key for Duplicity) <duplicity@mydomain.com>
sub 2048g/5385A6BB 2007-11-15
And you’re done. Make note of the key (in this case, 9929DAB1) as we’ll need that later too.
But I already have a key I want to use
OK, fine, but chances are, if you have a key already, you know how to get it. However, if you don’t know how to get your key, gpg --list-keys. You want the key in the ‘pub’ line… after the forward slash ‘/’
The Wrapper
This can be written in any language really. I chose shell because it’s easy and basic. You could run the duplicity now on the command line, but writing a wrapper is much more convenient and makes adding a cron job later a lot easier. Here’s what you’ll need:
- Your Amazon S3 Access Key ID and Secret Access Key. If you don’t have one, you’ll have to sign up for one.
- Your GPG key
- Your GPG key’s passphrase
- A list of directories you want to back up
Here’s a basic script that works for me:
#!/bin/bash
# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID=<your-access-key-id>
export AWS_SECRET_ACCESS_KEY=<your-secret-access-key>
export PASSPHRASE=<your-gpg-passphrase>
GPG_KEY=<your-gpg-key>
# The source of your backup
SOURCE=/
# The destination
# Note that the bucket need not exist
# but does need to be unique amongst all
# Amazon S3 users. So, choose wisely.
DEST=s3+http://<your-bucket-name>
duplicity
--encrypt-key=${GPG_KEY} \
--sign-key=${GPG_KEY} \
--include=/boot \
--include=/etc \
--include=/home \
--include=/root \
--include=/var/lib/mysql \
--exclude=/** \
${SOURCE} ${DEST}
# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=
And, that’s pretty much it. Save the file as something creative, like, backup and make it executable (chmod 700 backup). If you want to test it first (and you have the disk space), change the destination to some /tmp directory or external HDD. Once you’ve got it working the way you want, set it up as a cron job. Daily, weekly, monthly… doesn’t matter.
Duplicity is a nice backup solution for any situation, not just Amazon’s S3. It can handle HTTP, SCP and local backups as well. I highly recommend reading the duplicity man page and checking out the various command line arguments and availble options.
A couple of Thanks goes out to Tim McCormack’s and Ben and Ron’s articles which got me started.
Tim points out that, adding your GPG PASSPHRASE to the shell script might not be the most secure method, especially in a shared environment. I agree, however, it kind of defeats the purpose of automated backups if you have to actually enter your passphrase (twice) on the command line when calling the wrapper script. One way I managed to go around this is to create a simple C++ application that prints the passphrase.
Here’s the C++ code:
#include <stdio.h>
int main()
{
printf("your-gpg-passphrase");
return 0;
}
Compile
#~ gcc gpg-passphrase.c -o gpg-passphrase
Make it executable by your user and set the sticky bit so no one else can execute it
#~ chmod 700 gpg-passphrase
#~ chmod +s gpg-passphrase
Modify the wrapper script to use the binary for the passphrase
export PASSPHRASE=$(gpg-passphrase)
You might go as far as to do the same thing for your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as well. There are probably other ways around this, but this was a quick a dirty way to not have readable strings in shell scripts. I figure, if someone has rooted my server, I’ve got bigger problems to worry about than my data sitting on Amazon’s S3.










