Archive for the ‘c++’ Category

extern arrays and pointers

Saturday, November 17th, 2007

Quick, what’s the difference!

extern char data[];
extern char *data;

Well the first one’s and array and the second one is a pointer. You can treat them in the same way, but they are completely different things.

Here’s a function which looks the same with both declarations of data:

char f(int a)
{
    return data[a];
}

But if data is an array it compiles to the following. (gcc -O3 -fomit-frame-pointer)

f:
    # move the function parameter from the stack into $eax
    movl    4(%esp), %eax

    # access the byte at data + $eax and store into $eax
    movsbl  data(%eax),%eax

    ret

And if it’s a pointer it compiles to this.

f:
    # move the pointer value at the memory location data into $eax
    movl    data, %eax

    # move the function parameter from the stack into $edx
    movl    4(%esp), %edx  

    # access the byte at $edx + $eax and store into $eax
    movsbl  (%eax,%edx),%eax

    ret

It’s an easy enough mistake to make but I find it unintuitive the way that the change in external declaration silently changes the effect of the function without any change to the function definition itself.

It’s important to get external declarations consistent and this can be a problem if the array data lives in a non-C (or non-C++) file, e.g. test data included via an assembly file.

A mini test framework in a single header file

Wednesday, October 31st, 2007

After trying it on a number of projects, I’m now very enthusiastic about test driven development. At home I’ve rather missed the minimal support code that I had at my old job, so I’ve rewritten a miniature test framework in a single header file.

Only this time, it’s better. Framework implies something rather big. This isn’t. The design goal was to make something really simple and lightweight that just makes the process of writing tests as simple as possible with as few overheads as possible.

In this framework a test function is just a void function returning void. If the function doesn’t throw an exception when it is called then it has passed.

Here are some example tests, which show the three different type of assert macros provided. (Yes all the tests are broken!)

void strlen_test()
{
    HSHG_ASSERT( strlen( "1245" ) == 5 );
}

void sum_test()
{
    int total = 0;

    for( int i = 1; i <= 15; ++i )
    {
        total += 1;
        total += 7;
    }

    for( int i = 2; i <= 7; ++i )
    {
        total += i;
    }

    HSHG_ASSERT_DESC( total == 148, "Maximum break is 148" );
}

class MyThrowable
{
public:
    static void ThrowMe()
    {
        throw MyThrowable();
    }
};

void MyThrowable_test()
{
    HSHG_ASSERT_THROWS( true ? 0 : (MyThrowable::ThrowMe(), 0), MyThrowable );
}

The test namespace is HSHGTest and there is a struct TestFn for a test function which contains a char* for the function name and a pointer to the function itself.

There is then an inline function called RunTests that takes an array of these structs (terminated by one with a null function pointer); it runs each test in the array and reports to a given std::ostream; it then returns EXIT_SUCCESS if it ran some tests and they all passed, and EXIT_FAILURE otherwise. This makes it suitable for returning the result directly from a main function. Here is an example report.

testtest.cc:6: Test strlen_test failed. ( strlen( "1245" ) == 5 )
testtest.cc:24: Test sum_test failed. ( Maximum break is 148 )
testtest.cc:38: Test MyThrowable_test failed. ( Exception MyThrowable expected. )

If this sounds a bit laborious then there are some helper macros to set a suitable array up, and even a macro for a default main function:

HSHG_BEGIN_TESTS
HSHG_TEST_ENTRY( strlen_test )
HSHG_TEST_ENTRY( sum_test )
HSHG_TEST_ENTRY( MyThrowable_test )
HSHG_END_TESTS

HSHG_TEST_MAIN

These macros translate (roughly) into:

namespace
{

HSHGTest::TestFn tests[] =
{
    { "strlen_test", strlen_test },
    { "sum_test", sum_test },
    { "MyThrowable_test", MyThrowable_test },
    { NULL, NULL }
};

}

int main()
{
    return HSHGTest::RunTests( tests, std::cout );
}

The "framework" is available for download here. HSHGTest frawework

Address Space Monitor vs Google

Sunday, October 28th, 2007

In an unusual coincidence, googlebot visited my homepage the day after I put the link to Address Space Monitor live, and the next day it was the top search result for the query: ‘Address Space Monitor’. It stayed there for a few days but now the asm homepage is not in Google’s index at all. I’ve logged in to Google’s Webmaster Tools but so far they haven’t shed any light on the situation.

[Edit: Monday morning] And now it’s back in at #1. It must be a glitch in the google…

Address Space Monitor

Tuesday, October 23rd, 2007

Finally, I manage to release software on the unsuspecting world!

I wrote this tool in response to spending some painful time debugging a process which seemed unable to allocate a chunk of memory when most conventional tools were showing that the process wasn’t at the limit in terms of memory usage and the system hadn’t run out of swap space. The problem was virtual address space fragmentation.

Address Space Monitor is a windows tool that shows graphically how a process’ address space has been carved up and how big and where the biggest blobs of contiguous free memory are in the address space.

Naturally, if you are using the tool in earnest, the process which is giving you trouble will inevitably be resource heavy and slow. Hence, Address Space Monitor (ASM hence forth – not to be confused with assembly language source files) has been written to minimise its own resource usage while retaining boredom alleviating features such as fun colours and a bouncy CPU meter. You cannot yet use it to read mail, so it is still classed as “in development”. Oh yes, and the ‘a’ at the end of 0.5a means that it is alpha software.

Oh, where is it?

<c3po>Over Here!</c3po>

Template Metaprogramming Errors

Friday, September 14th, 2007

I’ve been having a very limited play with template metaprogramming and at one point I managed to end up with what appeared to be an infinitely recurring error message. It turns out that it wasn’t, as I was able to redirect all of the stderr output from the compiler to a file. It was, however, 22 megabytes.

$ g++ -Wall -pedantic -std=c++98 -O2 metatest.cc 2>dmp
$ wc -c dmp
22163606 dmp

Not bad for a 2k source file.

$ wc -c metatest.cc
2280 metatest.cc

I think a sensible compiler limit prevented the error message from getting “too” large.

metatest.cc:81: error: template instantiation depth exceeds maximum of 500
    (use -ftemplate-depth-NN to increase the maximum)

More obscure places to find the template keyword

Thursday, July 26th, 2007

Consider the following ill-formed c++ code:

class A
{
public:
    template< class U > U f();
};

template< class T , class U >
class B
{
public:
    T g()
    {
        // error: expected primary-expression before ‘>’ token
        return u.f< T >();
    }

private:
    U u;
};

template<> int A::f<int>() { return 0; }

int main()
{
    B< int, A > b;
    return b.g();
}

The class template B is “obviously” designed for use with a class (such as class A) which has a member function template f() which takes no parameters. Because the template argument for f cannot be deduced, we need to explicitly provide it with the f<type> syntax. However, if we try to do that, as f is dependent on the template argument U of B, the compiler cannot deduce whether f is a template type or not. If not, the token < really could just mean “less”. In order to disambiguate this the template keyword must be used in definition of B::g thus:

template< class T , class U >
class B
{
public:
    T f()
    {
        return u.template f< T >();
    }

private:
    U u;
};

A lot of compilers accept the ill-formed code without the template keyword, so you might not have noticed this. Fortunately most compilers do accept the correct code as well, so there should be no reason not to be correct.

Note that it is illegal to use the template keyword in this context (member function specifier) outside of a template declaration. Outside of a template definition there are no dependent types so the compiler can always resolve whether an identifier refers to a template or a non-template, so there can be no ambiguity.

Cross-compiling gcc, glibc and all that. (Part II – the script)

Wednesday, July 18th, 2007

crosscomp.sh

So here it is, the script that does it all. Although it is a shell script, I thorougly recommend not running it, but cutting and pasting it, section by section, into a terminal session so that you can fix up any environmental issues that cause it to fall over. There are some paths at the top of the script that you probably want to fix before starting. I ran the whole script on a PIII system and it churned through the entire process in just under an hour.

The basic procedure is to install a “cross” binutils (gas, ld, etc.), install kernel headers and glibc headers into a fake x86_64 system root, then compile a minimal gcc cross compiler without making any target libraries, then use this to compile a glibc for the target platform, then use the resulting glibc to build a full gcc with c++ support and the required target libraries. The reason for making two gcc is that making a full gcc requires some parts of the target glibc to be built with the cross-compiler that you haven’t yet built.

The script has six steps and three “fudges”, which I consider to be quite an achievement on the fudge count.

Important note: building glibc with nptl (the native Posix threads library for linux) does not work with the “stage one” minimal gcc. This is a big problem as glibc-2.6 only comes with nptl, as far as I’m aware, and you are stuck. glibc-2.5 has an add-on for the “old” linuxthreads support and you can build this with the “stage one” gcc compiler. Once you’ve built the full gcc with the “old” threads supported glibc-2.5 you can go back and build a full version of glibc-2.6 with nptl with this gcc. Once you’ve done this, if you’re feeling uncertain, you can go back and completely rebuild a full gcc on top of this glibc-2.6, just in case it makes a difference. Once you’ve done this, if you’re feeling very uncertain, you can completely rebuild glibc-2.6 with the brand new gcc, just in case in makes a difference. Once you’ve done this…

I’d really love to find a fix for getting the whole process to work without having to fall back to an old version of glibc. It feels (ha!) even more hacky than the rest of the process feels.

You’ll need:

binutils-2.16.1.tar.bz2
linux-2.6.20.1.tar.bz2
glibc-2.5.tar.bz2
glibc-linuxthreads-2.5.tar.bz2
gcc-core-4.2.0.tar.bz2
gcc-g++-4.2.0.tar.bz2

Like all such documents, this list will seem out of date before I’ve posted. By all means try the process with gcc 4.4, linux 2.8, glibc 2.7 and binutils 2.18, but the fudges will probably all have to be updated before it works.

Cross-compiling gcc, glibc and all that.

Monday, July 16th, 2007

Yippee! I finally managed to get a working cross-compiler x86 -> x86_64 on one of my linux boxes setup with c++ and shared library support. gcc, the kernel and glibc have the most annoying set of interdependencies so doing a ground up build is horribly painful, consisting of several rounds of bootstrapping. Like most people who manage this I’m so fed up with the whole process that I can’t be bothered to document what I did right now… however, I want to do it all again so that I can verify that I can do it from clean. When I do, I shall attempt to document the cleanest way to do this.

Just so this post contains one thing useful, this was the best resource that i found, so thank you to “vapier” @ “gentoo”. http://dev.gentoo.org/~vapier/CROSS-COMPILE-GUTS

Building code (Part II) – dependency generation

Wednesday, June 27th, 2007

Automatic dependency generation can make a huge difference on productivity. If you have a large project then building every source file, every time in a code and fix cycle can grind the process to a halt. Likewise, if your build process doesn’t rebuild any object file that already exists, or only rebuilds it when the corresponding source file has been updated without taking into account updated header files, then you can end up chasing phantom bugs due to incompatible object files. To get around this, without taking the expensive hit of a full rebuild, you tend to end up manually deleting groups of critical object files which you think are the affected ones and attempting to use an incremental build.

A working autodependency system should make incremental builds as minimal as possible, but no more minimal than that. Every time you hit make, everything that should be rebuilt is, and there is no manual upkeep of complex source file dependencies.

For some time now, many compilers have provided an alternative preprocessing switch which, instead of outputting the normal preprocessed code, outputs a makefile fragment which describes the object file dependencies on the source file and all the header files which are included, both directly and indirectly. This fragment, which contains dependency only rules (i.e. they do not specify a build command) can then be included in a larger makefile to form a functioning makefile with complete dependencies.

gcc has the -M switch, which works as described, and the -MM which works similarly but omits system header files. I tend to favour the latter since system header files change infrequently and you usually know when they have (e.g. a major system upgrade). When such an event occurs, usually every file in the project is outdated anyway, so a manually clean is no particular hardship. The generated makefiles without the system header files are usually a lot more compact.

For a file test.c that includes test.h but no other non-system header files, you usually get a rule in the generated test.d makefile which is something like this:

test.o: test.c test.h

This is exactly what is required so usually you place a rule in the project makefile along the lines of:

test.d: test.c
   gcc -M -o test.d test.c

include test.d

Due to the magic way make works, make will spot that while it can’t directly include test.d as it doesn’t yet exist, there is a rule to make it. Make will then make it – and any other included makefile that is not up to date that it has a rule for making – and restart the original makefile parsing step so that it can now include this makefile.

This is all well and good, as when you change test.h to #include “test2.h” make knows that test.o is out of date and needs to be rebuilt. The problem is that the test.d makefile has not been rebuilt so now there is an indirect dependency from test.o to test2.h, but test.d has not been rebuilt to reflect this. Previously, in the dark ages, the standard technique was to change the rule from generating the test.d file directly, to pass the output through a really ugly sed script instead. The script would replace the occurrences of ‘test.o’ with ‘test.d test.o’ so that test.d had the same dependencies as test.o and was correctly updated when the dependencies of test.o were updated. (What makes the sed script really ugly is usually the fact that it is defined in a pattern rule such as %.d: %.c and has to work with the make automatic variables like $@ and $< as well as regular expression syntax to do its magic. Sometimes the rule uses a temporary file for the original compiler dependency output, sometimes you are able to get the compiler to filter it straight into sed.)

The final thing that always used to irritate me about the sed script is that usually the compiler generates a makefile where the lines are all kept neatly to 80 column lines and line continuation syntax (’\’ newline) is used to avoid line wrapping. Passing this through a sed script adding a dependency almost always causes the first line to wrap. Who cares? Nobody reads automatically generated dependency makefiles and it doesn’t affect their functionality! I know it shouldn’t matter, but I don’t need to look at them, I know that they’re badly formatted makefiles sitting there and it irritates me.

Fortunately, with modern gcc, there is a better way. You can use -MF to specify the output file to write the dependency rule and then you can use either -MT or -MQ to specify what you want to appear as targets to the generated dependency rule in the makefile that is written out. In our case we want the following:

gcc -MM -MF test.d -MT test.o -MT test.d

The -MQ option does the same thing as -MT but automatically escapes any characters that are special to make, so that when the resulting makefile is parsed by make everything reads correctly. Sounds like a really useful feature, but none of my source files have a $ in their name, so I haven’t really found a need for it.

The final icing on the cake with gcc is that you can use the -MD and -MMD options. These do exactly the same thing as the -M and -MM options except that they don’t stop the rest of the processing, so you can go on and complete the compilation of the source file in question. In effect, they make the dependency generation a side effect of the compile step.

test.o test.d: test.c
    gcc -MMD -MT test.d -c -o test.o test.c

In this case, gcc automatically includes an implied -MQ with whatever the output file is, so any other -MQ and -MT options are added to this.

In practice, I don’t really like this last form of dependency generation. Every time you run make all the *.d files are made up to date (they are included in the overall makefile so they must be remade before any specified targets are built). As the *.d files are a side effect of compilation, this means that all the out of date object files are automatically remade. In effect, compilation becomes a side effect of dependency generation. This has the slightly bizarre effect that if you have no generated dependencies and no object files (a build from ‘distclean’), then running a ‘make clean’ – just to be sure – compiles all your object files and generates all your dependencies and then immediately deletes all the object files again.

So I use the explicit “dependency only” method to generate dependency makefiles.

[Note: For simplicity I have only really described explicit rules. Typically dependency makefiles are made though pattern rules of the form %d: %c using automatic variables such as $@ in the command syntax. Also for simplicity I have left out the compiler preprocessor and include options. For reliable dependency generation these should be identical to the options used the the compile step itself.]

Building Code (Part I) – Makefiles

Tuesday, June 26th, 2007

Makefiles are one of the things that I don’t like spending a lot of time working with, they’re there to support writing software so they should be easy and simple to work with, but they should “just work”.

So naturally, I’ve spent a lot of time working on a setup that “just works”.

There are a number of approaches to make systems which have various strengths and weaknesses.

One approach is to have the makefiles generated by a higher level system, usually written in a scripting language, which generates some makefiles. Makefile syntax isn’t a programming language, so if you attempt to do too much programming with it you can end up fighting against the tool and not working with it so this initially seems like an attractive idea. The downside is that you have to “regenerate” makefiles and this can be slower than necessary and, being outside of make, it is not easy to check when the makefiles need to be generated or to regenerate them incrementally. Some systems also generate makefiles which have poor dependency support, or which use make in a recursive fashion. Most systems are more restrictive than using make directly as they enforce a pattern. This can be considered a strength, as not following a set pattern can lead to a maintenance nightmare, but the down side is that you can end up fighting the system to do anything slightly out of the ordinary.

Oh, and if you haven’t read Paul Miller’s 1997 paper “Recursive Make Considered Harmful”, then I thoroughly recommend it.

The alternative is to manage one or more makefiles manually. Evidently makefiles can grow into unmaintainable monsters if they are not carefully managed so the usual way to manage this is by using a project “template” with well defined places where component specific files and settings are specified. This is my preferred approach but getting the template correct is the trick.

There are a number of features that I would like to see in a makefile system.

  • Dependencies for C and C++ sources should be generated automatically.
  • It must be be easy to add new projects and new files to existing projects.
  • It needs to support alternative source files, and generated source files (e.g. assembler files, [f]lex, yacc/bison files)
  • It must be possible to build different configurations from the same source. Which implies…
  • It must build automatically into a separate build directory, so .o files shouldn’t appear in the same directories as .c[c] files.

Is it possible to do all this with a “templated” make system?

(Hint: The answer is yes.)