Archive for the ‘Random’ Category

git backups interacting with git

Wednesday, December 5th, 2007

This is really important!

git as a generalized backup utility interacts with and git repositories that it finds in an ‘interesting’ way.

It treats them as a submodule, so instead of backing up the git repository, it just records a reference to the current HEAD of the submodule.

I believe that this is “by design”, but if you don’t set up the submodule configuration your backup repository won’t know where to find the correct repository with the recorded commit.

It also means that you need to be git pushing your precious repository data somewhere safe in any case.

git backup is also about three times slower than my tar based incremental backup, although incrementally saving the backups to a remote machine is quicker and backup browsing and recovery is a little easier.

git as a general purpose backup utility

Monday, December 3rd, 2007

When it was first suggested to me that you could just use git for backup I was not convinced. You would have these massive .git directories in high level places on your filesystem for one.

Now that I’ve had some time to reflect on the possibility I think that perhaps it isn’t such a crazy idea. It’s not actually true that you have to have a .git directory in the place that you want to back up. In fact, I am even trialling git alongside by regular “tar” based backup.

Here’s what I do. Suppose, for the sake of example, that I’m going to backup /home onto a separate backup partition called /backup.

Step 1 – Create a git repository for the backup

mkdir /backup/home.git
git --git-dir=/backup/home.git --work-tree=/home init

[
I used to do this as follows before I discovered about the --work-tree option to git. It has the same effect.

git --bare init
git config core.bare false
git config core.worktree /home

]

Step 2 – Initial backup

cd /home
git --git-dir=/backup/home.git add .
git --git-dir=/backup/home.git commit -m "Initial /home backup"

Step 3 – Copy backups to a safe remote machine
Assuming that you have a second machine where you want to store your backups to which you have ssh access (and has git installed), you can initialize a new empty git repository for this purpose.
Suppose that this machine is called other-machine and the repository is located at /backup/first-machine/home.git.

The initial remote backup is performed thus.

cd /backup/home.git
git remote add other-machine ssh://other-machine/backup/first-machine/home.git
git gc
git push other-machine master

The git gc seems fairly important. At this stage you have a massive git repository that hasn’t yet been packed. When you attempt to push it, git will want to perform a big “Deltifying” step to create a pack on the remote side. If you perform the git gc on the local side first it will perform the big “Deltifying” step and effectively store the results as a pack on the local side. The git push can use this and, having done the gc, subsequent local operations can also take advantage of the local pack whereas just letting the push do the pack would lose the work done from the local side.

Step 4 – Incremental backup

cd /home
git --git-dir=/backup/home.git add .
git --git-dir=/backup/home.git commit -a -m "Initial /home backup"

Performing both an “add” and a “commit -a” looks repetitive but is required as “commit -a” does not add new untracked files and “add” doesn’t ‘add’ file deletions to the index.

Step 5 – Push incremental backup to remote machine

cd /backup/home.git
git push other-machine master

Well, that was easy.

Disadvantages
The initial “git gc” step can be very slow.
git does not store owner/group information or atime and utime information. The backup is content only.
“git add .” is not robust against files that disappear while git’s looking at them (e.g. lock files). It tends to fail with a “cannot stat” message when you really want it to not bother with that file and carry on.

bashrc magic vs. terminfo

Tuesday, August 28th, 2007

A while ago, I was fiddling around with a custom terminfo entry to try and get 256 colour mode working for angband with PuTTY. (No , I don’t have time to play; I do sometimes have enough time to compile the latest version and fiddle around with terminfo settings.)

So I managed to get it to work (incorrect initc settings), but at the same time I seemed to lose the capability where the current user, machine and working directory were set in the titlebar. It worked when putty declared itself as being “xterm”, but not when it declared itself as “putty-256color”.

I thought there must be something missing from the putty-256color terminfo entry that I had created, that existed in the xterm terminfo entry. So I trawled through and copied every setting that was even vaguely related across but without success.

As far as I could work out, the settings that should be being used were to do with the “status line”. This is an extra information line that some terminals have that is not part of the main terminal display area. If the terminal has this feature it should have the boolean feature hs (also hs in termcap speak). Then there are three other related features: tsl (or ts in termcap) moves the output location to the status line (to status line), fsl (fs) moves the output location back to the main terminal area (from status line) and dsl (ds) should clear the status line (disable status line). xterm uses the “status line” feature to refer to its window title.

These were all set identically in both the xterm and putty-256color terminfo entries, and yet for some reason it wasn’t working in the putty-256color setup. A check of the bash variables revealed that the PROMPT_COMMAND just wasn’t being set in the putty case, despite the terminal supporting all the necessary features.

Finally, I find the source (literally!) of my woes. For some completely unfathomable reason the bashrc supplied with current fedora releases includes this horrible kludge:

# are we an interactive shell?
if [ "$PS1" ]; then
    case $TERM in
        xterm*)
                if [ -e /etc/sysconfig/bash-prompt-xterm ]; then
                        PROMPT_COMMAND=/etc/sysconfig/bash-prompt-xterm
                else
                PROMPT_COMMAND='echo -ne "\\033]0;${USER}@${HOSTNAME%%.*}:\\
                                        ${PWD/#$HOME/~}"; echo -ne "\\007"'
                fi
                ;;
# etc, etc.

(backslash newline continuations added for blog readability.)

Argghhh! Hardcoded ANSI escape sequences and terminal names when there is a perfectly reasonable alternative.

I have since replaced the offending part of the script with this, more flexible – if mildly less readable, version:

# are we an interactive shell?
if [ "$PS1" ]; then
        if tput hs; then
                PROMPT_COMMAND="echo -n \\"$(tput tsl)${USER}@${HOSTNAME%%.*}:\\
                                      \${PWD/#${HOME//\\//\\\\/}/~}$(tput fsl)\\""
        else
# etc, etc.

It also has the added advantage of only evaluating one shell variable (PWD) each time, the theory being that if you change machines, user or home directory you are probably going to be spawning a new shell in any case.

Spam, now assassinated

Friday, August 10th, 2007

I hadn’t quite realised how much spam was irritating me until I got rid of it all. Previously I had a fixed procmail filter for info@, sales@ addresses for my domain and then let thunderbird perform its adaptive junk mail filtering on anything left. Unfortunately, my primary email address must have been exposed a few months ago as I started receiving a lot more spam.

I thought that I was relatively happy with thunderbird perform the spam filtering task. It correctly detected 95% of spam, and didn’t generate any false positives. However, it is slow. I run thunderbird from a number of different locations, some over slow connections. When an email arrives thunderbird flags up a new mail icon, then downloads it and decides whether it’s spam. Too late! I’ve already seen the new mail icon; I have been disturbed my spam.

I installed spam assassin on my mail handling box, satisfying as many of the optional perl modules as I reasonably could.

I pointed sa-learn at my collection of 3,213 spam messages and at my last year’s worth of legitimate email for comparison.

procmail now puts all of the messages detected as spam by spam-assassin in my “assassinated-spam” folder. I’ve switched off thunderbirds junk detection and told it not to check the spam folder for new messages. So far spam-assassin has got every single message correct and I am no longer disturbed my spam. I still go into the “assassinated-spam” folder once in a while to check for false positives and admire how much annoyance I have been spared.

spam-assassin rules, go spam-assassin.

make – thank you for the music

Wednesday, July 11th, 2007

Make doesn’t have to be just for complex code building applications.

I have my own postgres database of the albums that I own and have ripped, mainly because I’m a bit picky about the formatting of tag metadata.

I have a script that you can point at a musicbrainz album entry and it will download it in xml format an insert it into my metadata repository (utf-8, of course) from where I can tweak it as I like in a perl base webapp which has the one really cool killer feature. You can download a zip file with a windows .cmd file and a bunch of metadata text files which, when unzipped in the same directory as a bunch of Track??.wav files, will losslessly compress all of the wav files into my music directory (under appropriately named artist and album sub-directories), tag them all with the correct metadata and finally run a “flac -t” test decompression over all the newly compressed files.

The cool thing is that I can rip my newly arrived album and start listening to in wav format in the temporary directory into which I’ve ripped it, and take the time to sort out the metadata (spacing, capitalization, special characters) at my leisure and compress, tag and store replaygain data when I’m ready.

The one thing that has been bugging me since I set up my dual core machine is that the compression process uses only one core as flac runs as a single thread and the cmd file just runs commands sequentially. I’ve popped the lid off the flac libraries before so I vaguely considered hacking in some multithread, multifile feature but this seemed like hard work. I thought about getting the cmd script to spawn to seperate command processors each with half the files, but then one might finish early if the track lengths or complexity happened to be radically different. Perhaps I could spawn each compression instance in the background and run them all in parallel, but it doesn’t seem very neat spawning of a dozen processes all at once and I’d have not neat way of waiting for them all to finish and run the test. Finally I realised that this is really a job for make.

So I tweaked the cmd file generating webapp to create me a makefile instead and now I run “make -j 2″ from the working directory. Hey presto, both cores are used and the compression takes about half the time. Here’s a quick sample:

.PHONY: all test

all: test

$(MYMUSIC)/Artist/Album/01_TrackOneTitle.flac: Track01.wav
    flac $< -o $@
    metaflac --remove-all-tags --no-utf8-convert --import-tags-from=track01.txt

$(MYMUSIC)/Artist/Album/01_TrackTwoTitle.flac: Track02.wav
    flac $< -o $@
    metaflac --remove-all-tags --no-utf8-convert --import-tags-from=track02.txt

# etc, etc, ...

OUTFILES := $(MYMUSIC)/Artist/Album/01_TrackOneTitle.flac \\
 $(MYMUSIC)/Artist/Album/01_TrackTwoTitle.flac
# etc, etc, ...

test: $(OUTFILES)
    flac -t $(OUTFILES)

You can add tags with the flac commandline but I use metaflac and text files because the commandline doesn't handle utf-8 so well, it's more reliable to generate tag import files and tell metaflac to import them straight into the vorbis comment block of the flac file without translation. It just works, and now at twice the speed.

Ramble on

Monday, June 25th, 2007

Well the upgrade to Wordpress 2.2.1 seemed to go relatively harmlessly and while I was at it I set up a cron job to backup my mysql database (not my choice, no postgresql option) to my home computer.

ssh is a very powerful tool. My backup across the network is a really simple shell script along the lines of:

#!/bin/sh
FNAME=${HOME}/mysqlbackup_`date +%F_%H_M_S`.sql.bz2
ssh myispaccount "mysqldump -h ispdbserver -u ispusername -pmindyourown\
    mydbname | bzip2 -9" >${FNAME} 2>/dev/null

ssh makes things really easy as when you specifiy a command, the things you want to happen to stdout and stdin actually happen. So in this example, bzip2 is part of the command sent to the server so the bzipping happens on the remote end, and ssh forwards the stdout of the remote command back to the local machine where I redirect it straight to a file. If I wanted I could have bunzipped it on the local end, but there’s no need. Having the backup saved as a timestamped bzip2ped sql file suits my needs.

Try this for funky mirroring:

ssh remotehost "tar -c -C remotefolder . | bzip2 -9" | tar -x -j -C localfolder

There’s a slight issue with tar and ssh which means that when you use a compression filter as a tar option (-j or -z) on the remote side it seems to send some trailing garbage at the end, rounding the communication up to some minimum block size. The untar deals quite happily with it, but bzip2 does warn. If you use a pipe at the remote end, this doesn’t seem to happen.

Entry number one

Tuesday, June 19th, 2007

Well this is it, it looks like I’ve managed to install Wordpress in my webspace. Now I just need something to write about…