Next Image
Make the current image sticky.
Previous Image
randys.org - randys.org

How-To: Automated Backups to Amazon’s S3 with Duplicity

I’ve been using Amazon’s S3 service for a couple months now. It was working OK using s3sync and a cron job, but it seemed like it wasn’t actually making incremental backups and I wasn’t 100% sure that it was backing up everything (i.e. it appeared to be crapping out once in a while). I searched around for various S3 backup solutions and found a handy utility called duplicity. Even more handy that it is available for most distributions (Archlinux, the debs, and Fedora anyway).

From the duplicity home page:

Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.

What you’ll need

You’ll need to make sure you have a few things installed before you install duplicity. Namely librsync and GnuPG. Luckily, if the duplicity package is available for your distribution, you probably needn’t worry.

Here’s a rundown of the steps involved:

  1. Generate a new GnuPG key
  2. Create a simple shell script wrapper
  3. Create a cron job

Generating a new Key

Start by generating a new gpg key for duplicity. Or if you have an existing one, you can use that.

N.B. I set this up on a Slice running Arch64 and had problems generating a new key (gpg --gen-key). Apparently, it could not generate enough entropy. Not a problem though: Just generate the keys else where and import them later if this happens to you.

#~ gpg --gen-key
gpg (GnuPG) 1.4.7; Copyright (C) 2006 Free Software Foundation, Inc.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions. See the file COPYING for details.

Please select what kind of key you want: (1) DSA and Elgamal (default) (2) DSA (sign only) (5) RSA (sign only) Your selection?

Default (DSA and Elgamal) is fine here.

DSA keypair will have 1024 bits.
ELG-E keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048)

The default (2048) is more than enough for this. Change it to whatever you want.

Requested keysize is 2048 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0)

Unless you want the key to expire (I don’t see why one would want that), the default is what we want.

Key does not expire at all
Is this correct? (y/N)

Um, yes, this is correct.

You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"

Real name: DuplicityBackup Email address: duplicity@mydomain.com Comment: Key for Duplicity You selected this USER-ID: "DuplicityBackup (Key for Duplicity) <duplicity@mydomain.com>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit?

Enter whatever information you want here and type ‘O’ for ‘Okay’

You need a Passphrase to protect your secret key.

Enter Passphrase:

Enter something. Anything. The more complex the better. This is your private data. Remember that it’s being transfered over http to a server you don’t own. I don’t care if it is Amazon. Remember what you type because you’ll need it later while creating the wrapper script.

gpg: key 9929DAB1 marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model gpg: depth: 0 valid: 2 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 2u pub 1024D/9929DAB1 2007-11-15 Key fingerprint = 3378 8E93 4349 0E7F 44F3 7C81 2460 5A11 9929 DAB1 uid DuplicityBackup (Key for Duplicity) <duplicity@mydomain.com> sub 2048g/5385A6BB 2007-11-15

And you’re done. Make note of the key (in this case, 9929DAB1) as we’ll need that later too.

But I already have a key I want to use

OK, fine, but chances are, if you have a key already, you know how to get it. However, if you don’t know how to get your key, gpg --list-keys. You want the key in the ‘pub’ line… after the forward slash ‘/’

The Wrapper

This can be written in any language really. I chose shell because it’s easy and basic. You could run the duplicity now on the command line, but writing a wrapper is much more convenient and makes adding a cron job later a lot easier. Here’s what you’ll need:

  • Your Amazon S3 Access Key ID and Secret Access Key. If you don’t have one, you’ll have to sign up for one.
  • Your GPG key
  • Your GPG key’s passphrase
  • A list of directories you want to back up

Here’s a basic script that works for me:

#!/bin/bash
# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID=&lt;your-access-key-id&gt;
export AWS_SECRET_ACCESS_KEY=&lt;your-secret-access-key&gt;
export PASSPHRASE=&lt;your-gpg-passphrase&gt;

GPG_KEY=&lt;your-gpg-key&gt;

# The source of your backup
SOURCE=/

# The destination
# Note that the bucket need not exist
# but does need to be unique amongst all
# Amazon S3 users. So, choose wisely.
DEST=s3+http://&lt;your-bucket-name&gt;

duplicity
    --encrypt-key=${GPG_KEY} \
    --sign-key=${GPG_KEY} \
    --include=/boot \
    --include=/etc \
    --include=/home \
    --include=/root \
    --include=/var/lib/mysql \
    --exclude=/** \
    ${SOURCE} ${DEST}

# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=

And, that’s pretty much it. Save the file as something creative, like, backup and make it executable (chmod 700 backup). If you want to test it first (and you have the disk space), change the destination to some /tmp directory or external HDD. Once you’ve got it working the way you want, set it up as a cron job. Daily, weekly, monthly… doesn’t matter.

Duplicity is a nice backup solution for any situation, not just Amazon’s S3. It can handle HTTP, SCP and local backups as well. I highly recommend reading the duplicity man page and checking out the various command line arguments and availble options.

A couple of Thanks goes out to Tim McCormack’s and Ben and Ron’s articles which got me started.


Tim points out that, adding your GPG PASSPHRASE to the shell script might not be the most secure method, especially in a shared environment. I agree, however, it kind of defeats the purpose of automated backups if you have to actually enter your passphrase (twice) on the command line when calling the wrapper script. One way I managed to go around this is to create a simple C++ application that prints the passphrase.

Here’s the C++ code:

#include <stdio.h>
int main()
{
    printf("your-gpg-passphrase");
    return 0;
}

Compile

#~ gcc gpg-passphrase.c -o gpg-passphrase

Make it executable by your user and set the sticky bit so no one else can execute it

#~ chmod 700 gpg-passphrase

~ chmod +s gpg-passphrase

Modify the wrapper script to use the binary for the passphrase

export PASSPHRASE=$(gpg-passphrase)

You might go as far as to do the same thing for your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as well. There are probably other ways around this, but this was a quick a dirty way to not have readable strings in shell scripts. I figure, if someone has rooted my server, I’ve got bigger problems to worry about than my data sitting on Amazon’s S3.

25 Responses to “How-To: Automated Backups to Amazon’s S3 with Duplicity”

  1. "chmod +s gpg-passphrase" sets the setuid and setgid bits, not the sticky bit. Don’t do that. The sticky bit is only meaningful for directories anyway, and set[ug]id is not something you want to throw around casually.

    Storing your passphrase in a binary doesn’t help security at all – you still can read the password from the file (if not by eye or with "strings", then just by copying the binary and adding an execute bit) and can also read it if you have only execute permissions (you don’t even need read permission, which you do for a #! script).

    On the other hand, there is value in not keeping it with your main script. Separating your code and key is a good idea so you can check your script into a version control repository without putting your password in there.

    I would probably just put it in a textfile with permissions 0600 and use either PASSPHRASE=$(cat textfile) (which duplicity seems set up for) or gpg –passphrase-fd=0 < textfile (which you would have to modify the code for).

  2. Thanks for your nice How To, got me to a working S3 backup solution in 10 mins.

    However, the bash script didn’t work for me, because there was another \ concat missing after the duplicity line, it now reads:

    duplicity \ –encrypt-key=${GPG_KEY} \ –sign-key=${GPG_KEY} \

    Thanks again for sharing this one.

    mat

    http://better-idea.org/trakkor Tracking single webpage elements via Atom Feeds and Web Hooks.

  3. rsa dsa…

    Intriguing idea, but I don’t know if I believe you one hundred percent….

  4. Jean-Pierre

    If you’re using Debian or Ubuntu, be sure to also install the python-boto package

  5. I am getting the following error when I try to run the bash script:

    sudo ./backup_script Traceback (most recent call last): File “/usr/bin/duplicity”, line 29, in ? from duplicity import collections, commandline, diffdir, dup_temp, \ File “/usr/lib64/python2.4/site-packages/duplicity/collections.py”, line 22, in ? import log, file_naming, path, dup_time, globals, manifest File “/usr/lib64/python2.4/site-packages/duplicity/path.py”, line 653, in ? import robust, tarfile, log, selection, globals, gpg, file_naming File “/usr/lib64/python2.4/site-packages/duplicity/gpg.py”, line 22, in ? import GnuPGInterface, misc, log, path ImportError: No module named GnuPGInterface ./backup_script: line 20: –encrypt-key=48185B89: command not found

    I don’t what its fussing about no module GuPGInterface, I installed gpg via: yum install gpg.

  6. Santiago Aguiar

    I don’t know why duplicity requires the PASSPHRASE even when you don’t want to sign the data, since for encryption, access to the unencrypted public key should be enough.

    Anyway, make sure your generated encription key is not only stored on the server you are backing up, otherwise if the server dies (ie. disk failure or WMD), there goes your key and your ability to recover your backed up data.

  7. Nice tutorial, first page in google for duplicity s3 ubuntu :-) I also recommend the readind of

    https://help.ubuntu.com/community/DuplicityBackupHowto

  8. [...] backup automatico da lanciare tramite cron: DuplicityBackupHowto – Community Ubuntu Documentation Automated Backups to Amazon’s S3 with Duplicity – by randys.org Posted in ubuntu, backup, cloud computing | Trackback | del.icio.us | Top Of [...]

  9. OK, but what about the keys? You need to back them up now.

  10. [...] is the combination of a couple of tips and tricks I found while trawling the web, notably from this howto at randys.org and this post over at the linode.com forums. Credit and thanks goes to the original authors – I [...]

  11. Why’d you need to give duplicity the GPG key password? To encrypt something using GPG you don’t need the password, you only need the password to decrypt…

  12. Nice walkthrough.

    One suggested improvement. You can remove the need for a GPG password by using the –archive-dir option. That, combined with the –encrypt-key option (which you’re already using) allows for unattended operation without a GPG password.

    You might have to disable signing the packages as I think that will require the GPG password. Potentially that’s a minimal drop in security.

    The –archive-dir option keeps a local, unencrypted copy of the latest sig files. So duplicity doesn’t have to download / decrypt them. Other than signing, that removes the need for a GPG passphrase.

    I suppose one option would be to sign with a different key (–sign-key). Potentially you could supply the GPG key for the signature key and not the encryption key. Thus, if somebody was able to get your key and passphrase, they would only be able to replace your backups with fakes, not decrypt your data.

    Cheers – Callum.

  13. Setting environment variables is not secure on a shared system. On most systems, running ‘ps ex’ will show the environment of all running processes.

  14. [...] I ended up trying out Amazon S3, using duplicity to rsync + PGP encrypt my data, using instructions here. Pricing for Amazon S3 storage starts out at 15 cents per gigabyte month. Personally I think this [...]

  15. [...] How-To: Automated Backups to Amazon’s S3 with Duplicity | randys.org – Example automated backup script for Duplicity using GPG and AWS's S3. [...]

  16. Great guide, but quick question – I am rather new to this. How would I then recovery my data in case of the a crash? That’s what I’m rather lost about (now including a key, etc)

  17. @Nabeel

    Have a look here. It’s pretty much like reversing ${SOURCE} and ${DEST} and adding a date in which you want to back up.

  18. Thanks for the reply! How many days does this keep a backup? Or is it continuous?

  19. Nevermind, got the answer to that! Thanks very much!

  20. [...] post adds only a couple small details to work described at randys.org and cenolan.com – go there for background on this post and useful scripts for automated Duplicity [...]

  21. Putting your GPG password into a C executable isn’t really going to help since it will be stored in there as text … basically I would say unless you are storing state secrets don’t worry about it, and if you are, enter your password by hand every time you back up.

  22. Woodside, I realize that now and I agree that backups of a small time blog like this don’t really need to be ultra strict. This was simply my method (at the time) to automated backups. On a VPS with no other users, storing your passphrase in a human readable text file is less of an issue (as long as your root password is relatively difficult crack) than someone in a multi-user environment. If security was my ultimate goal, I probably wouldn’t trust my data with a third-party service and use tape backups and store them in an offsite location.

  23. As per my earlier comment, you can do away with the need for a passphrase altogether. I’d suggest easier and more secure than storing the passphrase anywhere.

  24. If you’re having trouble with entropy gathering, running ls -lR / in the background (another login, another terminal window, or whatever) will help.

  25. I don’t think there’s any real need to have a full shell script. This is a one-liner: you can concatenate multiple shell variable definitions, and they will scope only over the one line.

    Here’s an example of what I mean:

    [07:27 PM] 332Mb$ BAR=”bar” [07:27 PM] 332Mb$ FOO=”foo” ls …. [07:27 PM] 332Mb$ echo $BAR bar [07:27 PM] 332Mb$ echo $FOO

    [07:27 PM] 332Mb$

    The $FOO is gone, but $BAR remains. (And nothing stops one from going ‘FOO=”foo” BAR=”bar”… ls’)

• • •

Leave a Reply





All content Copyright © 1999 — 2010 Randy Sesser | Happily Hosted by WebFaction
Entries (RSS) | Comments (RSS)