Archive for December, 2007

Don’t try for exception safety

Thursday, December 20th, 2007

Achieve it!

“Do or do not, there is no try” – Yoda, on strong exception safety.

I’ve decided that I like exceptions a lot more now that I know how to use them to my advantage. Take the following piece of code:


#include <cstring>
using std::memcmp;

size_t DoConversion( char* dst, size_t dst_len, const char* src, size_t src_len );
size_t DoConversion2( char* dst, size_t dst_len, const char* src, size_t src_len );

void test_conv( const char* test_data, size_t test_len )
{
        size_t bufsize = DoConversion( NULL, 0, test_data, test_len );

        char* buffer = new char[ bufsize ];

        DoConversion( buffer, bufsize, test_data, test_len );

        size_t bufsize2 = DoConversion2( NULL, 0, buffer, bufsize );

        if( bufsize2 != bufsize )
                throw TestFailed();

        char* buffer2 = new char[ bufsize2 ];

        DoConversion2( buffer2, bufsize2, buffer, bufsize );

        if( memcmp( buffer, buffer2, bufsize ) != 0 )
                throw TestFailed();

        delete[] buffer2;

        delete[] buffer;
}

Assume that DoConversion and DoConversion2 are “traditional” character string conversion functions. They take a source and destination buffer which they don’t memory manage and convert one to the other. If you supply a null destination buffer then they will tell you how big the destination buffer would have to be to complete the conversion without actually performing the conversion. Assume that they are less traditional in that they may throw a BadThing exception if something doesn’t work.

The test_conv function is obviously not exception safe, and in multiple ways. Trying to make it exception safe in a naive way – by adding some try/catch pairs – is verbose and error prone. I came up with this, but I have low confidence in the result (and I actually know of one definite reason why it is not exception safe).

void test_conv( const char* test_data, size_t test_len )
{
        size_t bufsize = DoConversion( NULL, 0, test_data, test_len );

        char* buffer = new char[ bufsize ];

        size_t bufsize2;

        try
        {
                DoConversion( buffer, bufsize, test_data, test_len );

                bufsize2 = DoConversion2( NULL, 0, buffer, bufsize );
        }
        catch( ... )
        {
                delete[] buffer;
                throw;
        }

        if( bufsize2 != bufsize )
        {
                delete[] buffer;
                throw TestFailed();
        }

        char* buffer2 = new char[ bufsize2 ];

        try
        {
                DoConversion2( buffer2, bufsize2, buffer, bufsize );
        }
        catch( ... )
        {
                delete[] buffer2;
                delete[] buffer;
                throw;
        }

        int res = memcmp( buffer, buffer2, bufsize );

        delete[] buffer2;
        delete[] buffer;

        if( res != 0 )
                throw TestFailed();
}

This is ugly in so many ways. buffer is allocated in one place and deallocated in one of four places (or even not at all!), depending on the particular path followed; bufsize2 now has to be declared before we can sensibly initialize it; the result of memcmp is cached so that deallocation can take place before deciding whether to throw or not (this was mildy shorter that duplicating the two delete statements yet again).

So here’s the answer: write a new class.

class AutoCharArray
{
public:
        AutoCharArray( size_t s ) : _buffer( new char[s] ) {}

        ~AutoCharArray() { delete[] _buffer; }

        operator char*() const { return _buffer; }

private:
        // No copying
        AutoCharArray( const AutoCharArray& );
        AutoCharArray& operator=( const AutoCharArray& );

        char* _buffer;
};

void test_conv( const char* test_data, size_t test_len )
{
        size_t bufsize = DoConversion( NULL, 0, test_data, test_len );

        AutoCharArray buffer( bufsize );

        DoConversion( buffer, bufsize, test_data, test_len );

        size_t bufsize2 = DoConversion2( NULL, 0, buffer, bufsize );

        if( bufsize2 != bufsize )
                throw TestFailed();

        AutoCharArray buffer2( bufsize2 );

        DoConversion2( buffer2, bufsize2, buffer, bufsize );

        if( memcmp( buffer, buffer2, bufsize ) != 0 )
                throw TestFailed();

}

AutoCharArray just manages the life time of the dynamically allocated char array as a C++ object. Because of this, we never have to worry about catching an rethrowing foreign exceptions. Because it is a C++ object, if it has been successfully constructed as a local object then it will be destroyed when the function exits, whether conventionally or via an exception. We don’t even have to worry about “new” failing. If new throws, the constructor will not have completed so the destructor will not be called on a bad pointer.

As well of all these advantages, the control flow for the optimistic ’success’ use case is obvious and easy to follow. It is not cluttered with a ton of “but in case that didn’t work” catch blocks. Overall, including the class definition, the entire code is no longer than the long winded “try/catch” quagmire of the first attempt.

A simple “AutoArray” class is very useful for this type of application, although I tend to prefer it as a template:

template< class T >
class AutoArray
{
public:
        AutoArray( size_t s ) : _buffer( new T[s] ) {}

        ~AutoArray() { delete[] _buffer; }

        operator T*() const { return _buffer; }

private:
        // No copying
        AutoArray( const AutoArray& );
        AutoArray& operator=( const AutoArray& );

        T* _buffer;
};

git backups interacting with git

Wednesday, December 5th, 2007

This is really important!

git as a generalized backup utility interacts with and git repositories that it finds in an ‘interesting’ way.

It treats them as a submodule, so instead of backing up the git repository, it just records a reference to the current HEAD of the submodule.

I believe that this is “by design”, but if you don’t set up the submodule configuration your backup repository won’t know where to find the correct repository with the recorded commit.

It also means that you need to be git pushing your precious repository data somewhere safe in any case.

git backup is also about three times slower than my tar based incremental backup, although incrementally saving the backups to a remote machine is quicker and backup browsing and recovery is a little easier.

git as a general purpose backup utility

Monday, December 3rd, 2007

When it was first suggested to me that you could just use git for backup I was not convinced. You would have these massive .git directories in high level places on your filesystem for one.

Now that I’ve had some time to reflect on the possibility I think that perhaps it isn’t such a crazy idea. It’s not actually true that you have to have a .git directory in the place that you want to back up. In fact, I am even trialling git alongside by regular “tar” based backup.

Here’s what I do. Suppose, for the sake of example, that I’m going to backup /home onto a separate backup partition called /backup.

Step 1 – Create a git repository for the backup

mkdir /backup/home.git
git --git-dir=/backup/home.git --work-tree=/home init

[
I used to do this as follows before I discovered about the --work-tree option to git. It has the same effect.

git --bare init
git config core.bare false
git config core.worktree /home

]

Step 2 – Initial backup

cd /home
git --git-dir=/backup/home.git add .
git --git-dir=/backup/home.git commit -m "Initial /home backup"

Step 3 – Copy backups to a safe remote machine
Assuming that you have a second machine where you want to store your backups to which you have ssh access (and has git installed), you can initialize a new empty git repository for this purpose.
Suppose that this machine is called other-machine and the repository is located at /backup/first-machine/home.git.

The initial remote backup is performed thus.

cd /backup/home.git
git remote add other-machine ssh://other-machine/backup/first-machine/home.git
git gc
git push other-machine master

The git gc seems fairly important. At this stage you have a massive git repository that hasn’t yet been packed. When you attempt to push it, git will want to perform a big “Deltifying” step to create a pack on the remote side. If you perform the git gc on the local side first it will perform the big “Deltifying” step and effectively store the results as a pack on the local side. The git push can use this and, having done the gc, subsequent local operations can also take advantage of the local pack whereas just letting the push do the pack would lose the work done from the local side.

Step 4 – Incremental backup

cd /home
git --git-dir=/backup/home.git add .
git --git-dir=/backup/home.git commit -a -m "Initial /home backup"

Performing both an “add” and a “commit -a” looks repetitive but is required as “commit -a” does not add new untracked files and “add” doesn’t ‘add’ file deletions to the index.

Step 5 – Push incremental backup to remote machine

cd /backup/home.git
git push other-machine master

Well, that was easy.

Disadvantages
The initial “git gc” step can be very slow.
git does not store owner/group information or atime and utime information. The backup is content only.
“git add .” is not robust against files that disappear while git’s looking at them (e.g. lock files). It tends to fail with a “cannot stat” message when you really want it to not bother with that file and carry on.