Archive for June, 2007

Building code (Part II) – dependency generation

Wednesday, June 27th, 2007

Automatic dependency generation can make a huge difference on productivity. If you have a large project then building every source file, every time in a code and fix cycle can grind the process to a halt. Likewise, if your build process doesn’t rebuild any object file that already exists, or only rebuilds it when the corresponding source file has been updated without taking into account updated header files, then you can end up chasing phantom bugs due to incompatible object files. To get around this, without taking the expensive hit of a full rebuild, you tend to end up manually deleting groups of critical object files which you think are the affected ones and attempting to use an incremental build.

A working autodependency system should make incremental builds as minimal as possible, but no more minimal than that. Every time you hit make, everything that should be rebuilt is, and there is no manual upkeep of complex source file dependencies.

For some time now, many compilers have provided an alternative preprocessing switch which, instead of outputting the normal preprocessed code, outputs a makefile fragment which describes the object file dependencies on the source file and all the header files which are included, both directly and indirectly. This fragment, which contains dependency only rules (i.e. they do not specify a build command) can then be included in a larger makefile to form a functioning makefile with complete dependencies.

gcc has the -M switch, which works as described, and the -MM which works similarly but omits system header files. I tend to favour the latter since system header files change infrequently and you usually know when they have (e.g. a major system upgrade). When such an event occurs, usually every file in the project is outdated anyway, so a manually clean is no particular hardship. The generated makefiles without the system header files are usually a lot more compact.

For a file test.c that includes test.h but no other non-system header files, you usually get a rule in the generated test.d makefile which is something like this:

test.o: test.c test.h

This is exactly what is required so usually you place a rule in the project makefile along the lines of:

test.d: test.c
   gcc -M -o test.d test.c

include test.d

Due to the magic way make works, make will spot that while it can’t directly include test.d as it doesn’t yet exist, there is a rule to make it. Make will then make it – and any other included makefile that is not up to date that it has a rule for making – and restart the original makefile parsing step so that it can now include this makefile.

This is all well and good, as when you change test.h to #include “test2.h” make knows that test.o is out of date and needs to be rebuilt. The problem is that the test.d makefile has not been rebuilt so now there is an indirect dependency from test.o to test2.h, but test.d has not been rebuilt to reflect this. Previously, in the dark ages, the standard technique was to change the rule from generating the test.d file directly, to pass the output through a really ugly sed script instead. The script would replace the occurrences of ‘test.o’ with ‘test.d test.o’ so that test.d had the same dependencies as test.o and was correctly updated when the dependencies of test.o were updated. (What makes the sed script really ugly is usually the fact that it is defined in a pattern rule such as %.d: %.c and has to work with the make automatic variables like $@ and $< as well as regular expression syntax to do its magic. Sometimes the rule uses a temporary file for the original compiler dependency output, sometimes you are able to get the compiler to filter it straight into sed.)

The final thing that always used to irritate me about the sed script is that usually the compiler generates a makefile where the lines are all kept neatly to 80 column lines and line continuation syntax (’\’ newline) is used to avoid line wrapping. Passing this through a sed script adding a dependency almost always causes the first line to wrap. Who cares? Nobody reads automatically generated dependency makefiles and it doesn’t affect their functionality! I know it shouldn’t matter, but I don’t need to look at them, I know that they’re badly formatted makefiles sitting there and it irritates me.

Fortunately, with modern gcc, there is a better way. You can use -MF to specify the output file to write the dependency rule and then you can use either -MT or -MQ to specify what you want to appear as targets to the generated dependency rule in the makefile that is written out. In our case we want the following:

gcc -MM -MF test.d -MT test.o -MT test.d

The -MQ option does the same thing as -MT but automatically escapes any characters that are special to make, so that when the resulting makefile is parsed by make everything reads correctly. Sounds like a really useful feature, but none of my source files have a $ in their name, so I haven’t really found a need for it.

The final icing on the cake with gcc is that you can use the -MD and -MMD options. These do exactly the same thing as the -M and -MM options except that they don’t stop the rest of the processing, so you can go on and complete the compilation of the source file in question. In effect, they make the dependency generation a side effect of the compile step.

test.o test.d: test.c
    gcc -MMD -MT test.d -c -o test.o test.c

In this case, gcc automatically includes an implied -MQ with whatever the output file is, so any other -MQ and -MT options are added to this.

In practice, I don’t really like this last form of dependency generation. Every time you run make all the *.d files are made up to date (they are included in the overall makefile so they must be remade before any specified targets are built). As the *.d files are a side effect of compilation, this means that all the out of date object files are automatically remade. In effect, compilation becomes a side effect of dependency generation. This has the slightly bizarre effect that if you have no generated dependencies and no object files (a build from ‘distclean’), then running a ‘make clean’ – just to be sure – compiles all your object files and generates all your dependencies and then immediately deletes all the object files again.

So I use the explicit “dependency only” method to generate dependency makefiles.

[Note: For simplicity I have only really described explicit rules. Typically dependency makefiles are made though pattern rules of the form %d: %c using automatic variables such as $@ in the command syntax. Also for simplicity I have left out the compiler preprocessor and include options. For reliable dependency generation these should be identical to the options used the the compile step itself.]

Building Code (Part I) – Makefiles

Tuesday, June 26th, 2007

Makefiles are one of the things that I don’t like spending a lot of time working with, they’re there to support writing software so they should be easy and simple to work with, but they should “just work”.

So naturally, I’ve spent a lot of time working on a setup that “just works”.

There are a number of approaches to make systems which have various strengths and weaknesses.

One approach is to have the makefiles generated by a higher level system, usually written in a scripting language, which generates some makefiles. Makefile syntax isn’t a programming language, so if you attempt to do too much programming with it you can end up fighting against the tool and not working with it so this initially seems like an attractive idea. The downside is that you have to “regenerate” makefiles and this can be slower than necessary and, being outside of make, it is not easy to check when the makefiles need to be generated or to regenerate them incrementally. Some systems also generate makefiles which have poor dependency support, or which use make in a recursive fashion. Most systems are more restrictive than using make directly as they enforce a pattern. This can be considered a strength, as not following a set pattern can lead to a maintenance nightmare, but the down side is that you can end up fighting the system to do anything slightly out of the ordinary.

Oh, and if you haven’t read Paul Miller’s 1997 paper “Recursive Make Considered Harmful”, then I thoroughly recommend it.

The alternative is to manage one or more makefiles manually. Evidently makefiles can grow into unmaintainable monsters if they are not carefully managed so the usual way to manage this is by using a project “template” with well defined places where component specific files and settings are specified. This is my preferred approach but getting the template correct is the trick.

There are a number of features that I would like to see in a makefile system.

  • Dependencies for C and C++ sources should be generated automatically.
  • It must be be easy to add new projects and new files to existing projects.
  • It needs to support alternative source files, and generated source files (e.g. assembler files, [f]lex, yacc/bison files)
  • It must be possible to build different configurations from the same source. Which implies…
  • It must build automatically into a separate build directory, so .o files shouldn’t appear in the same directories as .c[c] files.

Is it possible to do all this with a “templated” make system?

(Hint: The answer is yes.)

Ramble on

Monday, June 25th, 2007

Well the upgrade to Wordpress 2.2.1 seemed to go relatively harmlessly and while I was at it I set up a cron job to backup my mysql database (not my choice, no postgresql option) to my home computer.

ssh is a very powerful tool. My backup across the network is a really simple shell script along the lines of:

#!/bin/sh
FNAME=${HOME}/mysqlbackup_`date +%F_%H_M_S`.sql.bz2
ssh myispaccount "mysqldump -h ispdbserver -u ispusername -pmindyourown\
    mydbname | bzip2 -9" >${FNAME} 2>/dev/null

ssh makes things really easy as when you specifiy a command, the things you want to happen to stdout and stdin actually happen. So in this example, bzip2 is part of the command sent to the server so the bzipping happens on the remote end, and ssh forwards the stdout of the remote command back to the local machine where I redirect it straight to a file. If I wanted I could have bunzipped it on the local end, but there’s no need. Having the backup saved as a timestamped bzip2ped sql file suits my needs.

Try this for funky mirroring:

ssh remotehost "tar -c -C remotefolder . | bzip2 -9" | tar -x -j -C localfolder

There’s a slight issue with tar and ssh which means that when you use a compression filter as a tar option (-j or -z) on the remote side it seems to send some trailing garbage at the end, rounding the communication up to some minimum block size. The untar deals quite happily with it, but bzip2 does warn. If you use a pipe at the remote end, this doesn’t seem to happen.

restrict, yes it does make a difference

Friday, June 22nd, 2007

restrict is a new keyword in C99 (ISO/IEC 9899:1999) designed to enable the compiler to make more optimizations in response to programmers’ guarantees. You can read the full meaning of the keyword in the standard but an oversimplification would be that if a pointer is marked as restrict then it is a guarantee that the objects referred to through that pointer are not accessible in any other way in the block containing the pointer’s scope.

Here’s an example. Suppose we have a function that takes a pointer to two ints which are consecutive members of the Fibonacci sequence and the function is required to change the values of the two ints to the next distinct pair of numbers in the sequence. In other words, if the initial terms are A and B, with A being the later member of the sequence, then the next two terms are A + B and (A + B) + A = 2A + B. You might do this as follows:

void fibincr2(int *a, int *b)
{
    *a += *b; // *a contains A + B
    *b -= *a; // *b contains -A
    *a -= *b; // *a contains 2A + B
    *b += *a; // *b contains A + B
}

Here is the assembly generated by gcc -O3 -S -std=c99 -pedantic -Wall -fomit-frame-pointer with my comments added. I’ve taken A and B to be the initial values of *a and *b and put the value of the result of each instruction in the right hand column in terms of A and B.

Note where we reload eax from *a despite the fact that eax was stored to *a earlier and eax was not change in the intervening code.

fibincr2:
    pushl   %ebx            # Save ebx register
    movl    8(%esp), %ebx   # ebx is a (type int*)
    movl    12(%esp), %ecx  # ecx is b
    movl    (%ebx), %eax    # eax = *a      A
    addl    (%ecx), %eax    # eax += *b     A + B
    movl    %eax, (%ebx)    # *a = eax      A + B
    movl    (%ecx), %edx    # edx = *b      B
    subl    %eax, %edx      # edx -= eax    -A
    movl    %edx, (%ecx)    # *b = edx      -A
    movl    (%ebx), %eax    # eax = *a      A + B (so long as a and b are distinct)
    subl    %edx, %eax      # eax -= edx    2A + B
    movl    %eax, (%ebx)    # *a = eax      2A + B
    addl    %eax, (%ecx)    # *b += eax     A + B
    popl    %ebx            # restore ebx register
    ret

Of course, this all breaks down when a and b are pointing at the same int, but then the function is useless in this case so we may as well state this.

void fibincr2(int * restrict a, int * restrict b)
{
    *a += *b; // *a contains A + B
    *b -= *a; // *b contains -A
    *a -= *b; // *a contains 2A + B
    *b += *a; // *b contains A + B
}

And now the new assembly:

fibincr2:
    pushl   %ebx            # Save ebx register
    movl    8(%esp), %ebx   # ebx is a (type int*)
    movl    12(%esp), %ecx  # ecx is b
    movl    (%ebx), %eax    # eax = *a      A
    movl    %eax, %edx      # edx = eax     A
    negl    %eax            # eax = -eax    -A
    addl    (%ecx), %edx    # edx += *b     A + B
    movl    %eax, (%ecx)    # *b = eax      -A
    subl    %eax, %edx      # edx -= eax    2A + B
    addl    %edx, (%ecx)    # *b += edx     A + B
    movl    %edx, (%ebx)    # *a = edx      2A + B
    popl    %ebx            # restore ebx register
    ret

Note that now the pointers are marked as restrict the compiler no longer has to generate a read for the *a after write to *b if it already has the result of a previous read (or write to) *a in a register, as the compiler now knows that the write to *b cannot have affected *a.

So previously, ignoring the initial load of parameters, we had six “mov” (one register only), two “add” (both involving memory) and two “sub” (both register only).

After adding restrict we now have four “mov” (one register only), two “add” (both involving memory), one “sub” and one “neg” (both register only). In other words we’ve save two memory access instructions in a ten instruction algorithm. Go us.

Of course, in this case, the use of a temporary would produce even more efficient code.

void fibincr2(int * restrict a, int * restrict b)
{
    int c = *a;   //  c contains A
    *a += c + *b; // *a contains 2A + B
    *b += c;      // *b contains A + B
}
fibincr2:
    pushl   %ebx               # save ebx register
    movl    8(%esp), %ebx      # ebx is a (type int*)
    movl    12(%esp), %edx     # edx is b
    movl    (%ebx), %ecx       # ecx = *a           A
    leal    (%ecx,%ecx), %eax  # eax = ecx + ecx    2A
    addl    (%edx), %eax       # eax += *b          2A + B
    addl    %ecx, (%edx)       # *b += ecx          A + B
    movl    %eax, (%ebx)       # *a = eax           2A + B
    popl    %ebx               # restore ebx register
    ret

Two “mov”, two “add” and a “lea”. Go us.

Addendum
The example function and original algorithm was chosen for this entry because it exhibits behaviour that is affected by using the restrict keyword and because the function performs a readily comprehensible action that isn’t stretching the bounds of “plausibly useful” too much. The final algorithm shows that this particular problem can be solved in a more efficient way, but I was struggling to find an example that was succinct enough without being overly artificial.

Two things. Oh, and one other thing.

Wednesday, June 20th, 2007

There are two things you need to remember about smart pointers. First, they are not pointers, and second, they are not smart.

One thing you should know about typedefs. They don’t define new types.

iostreams: Why so little information?

Wednesday, June 20th, 2007

Most people’s exposure to C++ streams is with a “Hello, world” style program.

std::cout << "Hello, world!\n";

From there you move on to how you can “<<” different types and how you can produce the equivalent of something like the code below with only about 3 times as many characters.

printf("%08x : %.17g", uParm1, dParm2);

It’s not exactly a winning argument for using iostreams over printf and its friends.

So where is the power of iostreams? Well, stage two comes with the introduction of the ability to define your own insertion and extraction operations.

std::ostream& operator<<(std::ostream& , const MyClass&);
std::istream& operator>>(std::istream& , MyClass&);

The syntax isn’t trivial for beginners and, due to the way that the operators take a standard class on the left, you probably have to introduce the concept of friendship, but it’s pretty trivial.

So this is all well and good. We’ve created our own types and because we have defined the insertion and extraction operators in terms of the istream and ostream classes we can take advantage of writing to string, file, anything in fact!

Excellent, lets write an ostream that writes to a MyBucket class. By this I mean that I have a MyBucket class that can consume bytes and arranges them prettily in a plastic bucket. I’d like to wrap an ostream interface around MyBucket so that objects that have an operator<< onto and ostream can write into MyBucket. Should be easy, right?

So how do we do that?

This is where I found there to be a lack of readily available documentation how extend the iostream library in this way.

Naïvely, you might think that we have an inteface class, ostream, and we want to change its functionality so perhaps we should derive a class from it and override the functions we want to change. If you have a look at the documentation for ostream you’ll discover that there are virtually no virtual functions. It does have a virtual destructor. This means that if you pass it into a function like std::ostream& operator<<(std::ostream&, const SomeoneElsesClass&); then you’re not going to be able to influence anything that happens inside this class from your derived ostream.

The “correct” answer is that you need to write a specialized std::streambuf. The streambuf interface abstracts the raw byte buffer functionality of files, strings, etc. from the [io]stream classes and allows the [io]stream class to perform all the formatting functionality that is what they are there for.

Once you have a class MyBucketStreamBuf : public std::streambuf { ... you can then construct [io]streams around it an pass them around as any other [io]stream class. ostream (really std::basic_ostream) has an explicit constructor that takes a pointer to (any form of) streambuf, or if you want to you can now derive from ostream and have it manage the lifetime of the custom streambuf class, and initialize the base ostream class with a pointer to it. Easy!

So why is there so little documentation on how to do anything more with iostreams than define your own insertion and extraction operations?

Entry number one

Tuesday, June 19th, 2007

Well this is it, it looks like I’ve managed to install Wordpress in my webspace. Now I just need something to write about…