Oct 16, 2009

pefs dircache benchmark

I've recently added directory caching into pefs.

Despite of being directory listing cache (like dirhash for ufs) it also acts as encrypted file name cache. So that there is no need to decrypt names for the same entries all the time. That was really big issue because directory listing has to be reread on almost every vnode lookup operation. It made operations on directories with 1000 and more files too time consuming.

The cache is getting updated at two points: during vnode lookup operation and during readdir call. Vnode generation attribute is used to monitor directory changes (the same way NFS works) and expire the cache if it changes. There is no per-operation monitoring because that would violate stacked filesystem nature (and also complicate the code). There are some issues regarding large directories handling within dircache. First of all results of consequent readdir calls considered inconsistent, i.e cache expires if user provided buffer is too small to fit entire directory listing. And while doing a vnode lookup search doesn't terminate if matching directory entry found, it further traverses directory to update the cache.

There is vfs.pefs.dircache_enable sysctl to control cache validity. Setting it to zero would force always treating cache as invalid, and thus dircache would function only as a file name encryption cache.

At the moment caching is only enabled for name decryption, but there are operations like rm or rmdir which perform name encryption on every call to pass data to underlying filesystem. Enabling caching for such operations is not going to be hard, but I just want code to stabilize a bit before moving further.

I've performed two types of tests: dbench and handling directories with large number of files. I've used pefs mounted on top of tmpfs to measure pefs overhead but not disk io performance. Salsa20 algorithms with 256 bit key was chosen because of being the fastest available. Before each run underlying tmpfs filesystem was remounted. Each test was run for 3 times, and average of results is shown in charts (distribution was less then 2%). Also note that I've used kernel with some extra debugging compiled in (invariants, lock debugging).

dbench doesn't show much difference with dircache enable comparing to plain pefs and old pefs without dircache: 143,635 Mb/s against 116,746 Mb/s; although, it's 18% improvement witch is very good imho. Also interesting is that result gets just a bit lower after setting vfs.pefs.dircache_enable=0: 141,289 Mb/s with dircache_enable=0 against 143,635 Mb/s.

Dbench uses directories with small number of entries (usually ~20). That perfectly explains the results achieved. Handling large directories is where dircache shines. I've used the following trivial script for testing, it creates 1000 or 2000 files, does 'ls -l' and removes these files:
for i in `jot 1000`; do
touch test-$i
ls -Al >/dev/null
find . -name test-\* -exec rm '{}' +

The chart speaks for itself. And per file overhead looks much closer to expected linear growth after running the same test for 3000 files:

Oct 1, 2009

Encrypting private directory with pefs

pefs is a kernel level cryptographic filesystem. It works transparently on top of other filesystems and doesn't require root privileges. There is no need to allocate another partition and take additional care of backups, resizing partition when it fills up, etc.

After installing pefs create a new directory to encrypt. Let it be ~/Private:

% mkdir ~/Private

And mount pefs on top of it (root privileges are necessary to mount filesystem unless you have vfs.usermount sysctl set to non-zero):

% pefs mount ~/Private ~/Private

At this point ~/Private behaves like read-only filesystem because no keys are set up yet. To make it useful add a new key:

% pefs addkey ~/Private

After entering a passphrase, you can check active keys:

% pefs showkeys ~/Private
0 b0bed3f7f33e461b aes256-ctr

As you can see AES algorithm is used by default (in CTR mode with 256 bit key). It can be changed with pefs addkey -a option.

You should take into account that pefs doesn't save any metadata. That means that there is no way for filesystem to "verify" the key. To work around it key chaining can be used (pefs showchain, setchain, delchain). I'm going show how it works in next posts.

Let's give it a try:

% echo "Hello WORLD" > ~/Private/test
% ls -Al ~/Private
total 1
-rw-r--r-- 1 gleb gleb 12 Oct 1 12:55 test
% cat ~/Private/test

Here is what it looks like at lower filesystem level:

% pefs unmount ~/Private
% ls -Al ~/Private
total 1
-rw-r--r-- 1 gleb gleb 12 Oct 1 12:55 .DU6eudxZGtO8Ry_2Z3Sl+tq2hV3O75jq
% hd ~/Private/.DU6eudxZGtO8Ry_2Z3Sl+tq2hV3O75jq
00000000 7f 1e 1b 05 fc 8a 5c 38 fc d8 2d 5f |......\8..-_|

Your result is going to be different because pefs uses random tweak value to encrypt files. This tweak is saved in encrypted file name. Using the tweak also means that the same files have different encrypted content.