Archive for July, 2007

restrict, part II = x86_64

Thursday, July 26th, 2007

Just to make some use of my cross compiler, here’s how the assembler for x86_64 stacks up for the restrict example which I posted before.

Without restrict:

fibincr2:
    movl    (%rdi), %eax  # eax = *a    A
    addl    (%rsi), %eax  # eax += *b   A + B
    movl    %eax, (%rdi)  # *a = eax    A + B
    movl    (%rsi), %edx  # edx = *b    B
    subl    %eax, %edx    # edx -= eax  -A
    movl    %edx, (%rsi)  # *b = edx    -A
    movl    (%rdi), %eax  # eax = *a    A + B
    subl    %edx, %eax    # eax -= edx  2A + B
    movl    %eax, (%rdi)  # *a = eax    2A + B
    addl    %eax, (%rsi)  # *b += eax   A + B
    ret

6 mov, 2 sub (2 register only), 2 add

With restrict:

fibincr2:
    movl    (%rdi), %eax  # eax = *a    A
    movl    %eax, %edx    # edx = eax   A
    addl    (%rsi), %edx  # edx += *b   A + B
    negl    %eax          # eax = -eax  -A
    movl    %eax, (%rsi)  # *b = eax    -A
    subl    %eax, %edx    # edx -= eax  2A + B
    addl    %edx, (%rsi)  # *b += edx   A + B
    movl    %edx, (%rdi)  # *a = edx    2A + B
    ret

4 mov (1 register only), 1 sub (1 register only), 2 add, 1 neg (1 register only)

Alternate algorithm:

fibincr2:
    movl    (%rdi), %edx
    leal    (%rdx,%rdx), %eax
    addl    (%rsi), %eax
    addl    %edx, (%rsi)
    movl    %eax, (%rdi)
    ret

2 mov, 2 add, 1 lea.

So the instruction counts stay exactly the same for the algorithms but as the x86_64 calling convention uses registers to pass variables we have no additional overhead for stack accessing instructions.

More obscure places to find the template keyword

Thursday, July 26th, 2007

Consider the following ill-formed c++ code:

class A
{
public:
    template< class U > U f();
};

template< class T , class U >
class B
{
public:
    T g()
    {
        // error: expected primary-expression before ‘>’ token
        return u.f< T >();
    }

private:
    U u;
};

template<> int A::f<int>() { return 0; }

int main()
{
    B< int, A > b;
    return b.g();
}

The class template B is “obviously” designed for use with a class (such as class A) which has a member function template f() which takes no parameters. Because the template argument for f cannot be deduced, we need to explicitly provide it with the f<type> syntax. However, if we try to do that, as f is dependent on the template argument U of B, the compiler cannot deduce whether f is a template type or not. If not, the token < really could just mean “less”. In order to disambiguate this the template keyword must be used in definition of B::g thus:

template< class T , class U >
class B
{
public:
    T f()
    {
        return u.template f< T >();
    }

private:
    U u;
};

A lot of compilers accept the ill-formed code without the template keyword, so you might not have noticed this. Fortunately most compilers do accept the correct code as well, so there should be no reason not to be correct.

Note that it is illegal to use the template keyword in this context (member function specifier) outside of a template declaration. Outside of a template definition there are no dependent types so the compiler can always resolve whether an identifier refers to a template or a non-template, so there can be no ambiguity.

Cross-compiling gcc, glibc and all that. (Part II – the script)

Wednesday, July 18th, 2007

crosscomp.sh

So here it is, the script that does it all. Although it is a shell script, I thorougly recommend not running it, but cutting and pasting it, section by section, into a terminal session so that you can fix up any environmental issues that cause it to fall over. There are some paths at the top of the script that you probably want to fix before starting. I ran the whole script on a PIII system and it churned through the entire process in just under an hour.

The basic procedure is to install a “cross” binutils (gas, ld, etc.), install kernel headers and glibc headers into a fake x86_64 system root, then compile a minimal gcc cross compiler without making any target libraries, then use this to compile a glibc for the target platform, then use the resulting glibc to build a full gcc with c++ support and the required target libraries. The reason for making two gcc is that making a full gcc requires some parts of the target glibc to be built with the cross-compiler that you haven’t yet built.

The script has six steps and three “fudges”, which I consider to be quite an achievement on the fudge count.

Important note: building glibc with nptl (the native Posix threads library for linux) does not work with the “stage one” minimal gcc. This is a big problem as glibc-2.6 only comes with nptl, as far as I’m aware, and you are stuck. glibc-2.5 has an add-on for the “old” linuxthreads support and you can build this with the “stage one” gcc compiler. Once you’ve built the full gcc with the “old” threads supported glibc-2.5 you can go back and build a full version of glibc-2.6 with nptl with this gcc. Once you’ve done this, if you’re feeling uncertain, you can go back and completely rebuild a full gcc on top of this glibc-2.6, just in case it makes a difference. Once you’ve done this, if you’re feeling very uncertain, you can completely rebuild glibc-2.6 with the brand new gcc, just in case in makes a difference. Once you’ve done this…

I’d really love to find a fix for getting the whole process to work without having to fall back to an old version of glibc. It feels (ha!) even more hacky than the rest of the process feels.

You’ll need:

binutils-2.16.1.tar.bz2
linux-2.6.20.1.tar.bz2
glibc-2.5.tar.bz2
glibc-linuxthreads-2.5.tar.bz2
gcc-core-4.2.0.tar.bz2
gcc-g++-4.2.0.tar.bz2

Like all such documents, this list will seem out of date before I’ve posted. By all means try the process with gcc 4.4, linux 2.8, glibc 2.7 and binutils 2.18, but the fudges will probably all have to be updated before it works.

Cross-compiling gcc, glibc and all that.

Monday, July 16th, 2007

Yippee! I finally managed to get a working cross-compiler x86 -> x86_64 on one of my linux boxes setup with c++ and shared library support. gcc, the kernel and glibc have the most annoying set of interdependencies so doing a ground up build is horribly painful, consisting of several rounds of bootstrapping. Like most people who manage this I’m so fed up with the whole process that I can’t be bothered to document what I did right now… however, I want to do it all again so that I can verify that I can do it from clean. When I do, I shall attempt to document the cleanest way to do this.

Just so this post contains one thing useful, this was the best resource that i found, so thank you to “vapier” @ “gentoo”. http://dev.gentoo.org/~vapier/CROSS-COMPILE-GUTS

make – thank you for the music

Wednesday, July 11th, 2007

Make doesn’t have to be just for complex code building applications.

I have my own postgres database of the albums that I own and have ripped, mainly because I’m a bit picky about the formatting of tag metadata.

I have a script that you can point at a musicbrainz album entry and it will download it in xml format an insert it into my metadata repository (utf-8, of course) from where I can tweak it as I like in a perl base webapp which has the one really cool killer feature. You can download a zip file with a windows .cmd file and a bunch of metadata text files which, when unzipped in the same directory as a bunch of Track??.wav files, will losslessly compress all of the wav files into my music directory (under appropriately named artist and album sub-directories), tag them all with the correct metadata and finally run a “flac -t” test decompression over all the newly compressed files.

The cool thing is that I can rip my newly arrived album and start listening to in wav format in the temporary directory into which I’ve ripped it, and take the time to sort out the metadata (spacing, capitalization, special characters) at my leisure and compress, tag and store replaygain data when I’m ready.

The one thing that has been bugging me since I set up my dual core machine is that the compression process uses only one core as flac runs as a single thread and the cmd file just runs commands sequentially. I’ve popped the lid off the flac libraries before so I vaguely considered hacking in some multithread, multifile feature but this seemed like hard work. I thought about getting the cmd script to spawn to seperate command processors each with half the files, but then one might finish early if the track lengths or complexity happened to be radically different. Perhaps I could spawn each compression instance in the background and run them all in parallel, but it doesn’t seem very neat spawning of a dozen processes all at once and I’d have not neat way of waiting for them all to finish and run the test. Finally I realised that this is really a job for make.

So I tweaked the cmd file generating webapp to create me a makefile instead and now I run “make -j 2″ from the working directory. Hey presto, both cores are used and the compression takes about half the time. Here’s a quick sample:

.PHONY: all test

all: test

$(MYMUSIC)/Artist/Album/01_TrackOneTitle.flac: Track01.wav
    flac $< -o $@
    metaflac --remove-all-tags --no-utf8-convert --import-tags-from=track01.txt

$(MYMUSIC)/Artist/Album/01_TrackTwoTitle.flac: Track02.wav
    flac $< -o $@
    metaflac --remove-all-tags --no-utf8-convert --import-tags-from=track02.txt

# etc, etc, ...

OUTFILES := $(MYMUSIC)/Artist/Album/01_TrackOneTitle.flac \\
 $(MYMUSIC)/Artist/Album/01_TrackTwoTitle.flac
# etc, etc, ...

test: $(OUTFILES)
    flac -t $(OUTFILES)

You can add tags with the flac commandline but I use metaflac and text files because the commandline doesn't handle utf-8 so well, it's more reliable to generate tag import files and tell metaflac to import them straight into the vorbis comment block of the flac file without translation. It just works, and now at twice the speed.