Apr 14, 2011

Secure backups for a lazy developer

Developer is always afraid of loosing source code. As a rule after crash you'll be able restore all but several last revisions, or you'll get sources but have repository damaged. It doesn't happen often, but it's better to feel safe.

Backup of a central repository on server and personal project backup are two very different stories. Developers are too lazy to use server-like backup  methods. It's quite common to create archive of all projects, encrypt it and store somewhere. Writing archive to CD-R or other media is time consuming and these CD-R's tend to get lost the next day you burn them. Another option is to use online backup service, especially considering that 1-2 GB is often available for free. The process can be automated, but has several limitations: entire archive has to be downloaded/uploaded, cumbersome data restore, complicated encryption keys/passwords management.

My list of desired features for a backup system:
  • Store data encrypted
  • Download/upload only bits that changed
  • Automated synchronization

To achieve these goals I use stacked cryptographic file system and free online
storage service: PEFS and Ubuntu One. PEFS is FreeBSD kernel level stacked file system and Ubuntu One seems to be the only service with open source client supporting synchronization. Linux users could use ecryptfs/encfs with dropbox or similar setup.

I backup git repositories, but it should work for regular data as well. git was chosen because it stores revision deltas in separate packs but not in individual files, i.e. new backup will create a new file and leave existing objects intact. Another reason is to prevent inconsistent syncs when using several clients. The way merge (if any) is performed by service provider remains mister but it can be easily controlled by version control system.

Layout:
/backup/.encrypted - encrypted data synced with online storage
/backup/local - decrypted local representation


Create backup file system:
# # (optional) zfs create tank/backup
# mkdir -p /backup/{.encrypted,local}
# pefs addchain -fZ /backup/.local
Enter password


Mount encrypted filesystem and add key:
# pefs mount /backup/.enrypted /backup/local
# pefs addkey -c /backup/local
Enter password


Backup a project:
# git init --bare /backup/local/project1.git
# cd project1_dir

# git remote add backup /backup/local/project1.git
# git push --mirror backup


Here is what encrypted data looks like:
# ls -A /backup/.encrypted
.D3ForUU+Xh8DEL3b1oRGYfD57VKQqLahzYZnHRjINSDT3hqJMRAPqA
.Gz8xQqNAzQFqQ4CiOZPGSlEIbf+tVvZHXG1SisReRxfwqpKJK0VYvA
.O96wecIt1g4YnhFTTp3KTW2mWFk33vQBt4ZBvX9ZbMPP5HCd0INbgg
.pefs.db


u1sync command line tool is used to sync data. You'd need to register at
Ubuntu One and extract oauth authentication tokens, step-by-step guide is here.

Initialize shared directory:
# u1sync --oauth FOO:BAR --init /backup/.encrypted

Sync it:
# u1sync --oauth FOO:BAR /backup/.encrypted

There is no u1sync port for FreeBSD yet, you'd have to install all dependencies from ports and make it run by hand. To speed things up I've also extended u1sync to store oauth tokens and created /backup/Makefile with mount/unmount/sync targets.

Jan 22, 2011

PEFS changelog

PEFS changelog since September 2010:
  • Add AESNI hardware acceleration support.
  • Several rename fixes: vnode reference leak, incorrect locking, livelock, missing lookup(), always perform nfs-style dummy rename.
  • Skip directory entries with zero inode number (empty entry) (could result in reusing invalid entries).
  • Fix mounting ZFS snapshots (incorrect vn_fullpath locking).
  • Reduce possibility of free vnode shortage livelock by freeing vnode in-place for non-ZFS file systems and if called from vnlru proc in ZFS case. Add asyncreclaim mount option.
  • Add missing vnode operations: vop_pathconf, vop_getacl. Improve error repoting in link() and truncate().
  • Report correct max name size and max symlink size supported.
  • Always use 4Kb block to support archs with large page sizes in future.
  • Use AES128-CTR to encrypt keys in chain database, simplify Key-Encryption-Key generation procedure. Database has to be recreated anew after the change.