pexif 0.13 release

Thu, 23 Apr 2009 11:19:06 +0000
tech code pexif python

pexif is the python library for editing an image’s EXIF data. Somewhat embarrassingly, the last release I made (0.12) had a really stupid bug in it. This has now been rectified, and a new version (0.13) is now available. a build tool framework

Fri, 06 Jun 2008 15:50:47 +0000
code tech article python

I’ve, recently started using, as the core of my build system for a new project I’m working on. This simplicity involved is pretty neat. Rather than manually needing to work out the dependencies, (or having specialised tools for determining the dependencies), with, you simply write the commands you need to build your project, and works out all the dependencies for you.

So, what’s the catch? Well, the way works is by using strace to record all the system calls that a program makes during its execution. By analyzing this list can work out all the files that are touched when a command is run, and then stores this as a list of dependencies for that command. Then, the next time you run the same command first checks to see if any of the dependencies have change (using either md5sum, or timestamp), and only runs the command if any of the dependencies have changed. So the catch of course is that this only runs on Linux (as far as I know, you can’t get strace anywhere else, although that doesn’t mean the same techniques couldn’t be used with a different underlying system call tracing tool).

This technique is quite a radical difference to other tools which determine a large dependency graph of the entire build, and then, recursively work through this graph to fulfil unmet dependencies. As a result this form is a lot more imperative, rather than declarative style. Traditional tools (SCons, make, etc), provide a language which allows you to essentially describe a dependency graph, and then the order in which things are executed is really hidden inside the tool. Using is a lot different. You go through defining the commands you want to run (in order!), and that is basically it.

Some of the advantages of this approach are:

There are however some disadvantages:

As with may good tools in your programming kit, is available under a very liberal BSD style license, which is nice, because I’ve been able to fix up some problems and add some extra functionality. In particular I’ve added options to:

The patch and full file are available. These have of course been provided upstream, so with any luck, some or most of them will be merged upstream.

So, if you have a primarily Linux project, and want to try something different to SCons, or make, I’d recommend considering

Video - Porting OKL4 to a new SoC

Thu, 29 May 2008 18:45:49 +0000
tech article code okl4

Earlier this year I presented at the embedded miniconf, about how to port OKL4 to a new SoC. The video was taped and had until recently been available on the 2008 website, but for some reason that website has gone awol, so I thought it was a good time to put up my own copy. These videos have the advantage that they have gone through a painstaking post-production phase, which seamlessly meld the slides into the video (well, not quite seamless), and also all the bad bits have been removed.

This presentation gives a really good overview of what is involved in porting OKL4 to a new SoC. However, please note that the specific APIs have been somewhat simplified for pedagogical reasons, so this is more an introduction to the concepts, rather than a tutorial as such.

The videos are available in Ogg/Theora and also Quicktime/H264 formats, and either CIF (352x288) or PAL (720x576). If you can afford the bandwidth I would recommend hi-res ones, as then you can actually see what is one the screen.

Simple File Monitoring on Mac OS X

Thu, 15 May 2008 02:21:22 +0000
tech code article osx

Mac OS X has the kevent() system call which allows you to monitor various kernel events. This is kind of useful, because I want to, well, watch a file, and then do something when it changes. Now, I would have thought I could find something really simple to do this, but I could only find massive GUI programs programs to do this, which is not so great for scripting.

Anyway, long story short, I decided to write my own. It was pretty straight forward. I thought it was worth documenting how it works so that Benno 5 years from now can remember how to use kevent.

The first important thing you need to create a kernel queue using the kqueue() system call. This system call returns a descriptor which allows you get to use on calls to kevent(). These descriptors come out of the file descriptor namespace, but don't actually get inherited on fork().

int kq;
    kq = kqueue();
    if (kq == -1) {
        err(1, "kq!");

After creating the kernel queue, an event is register. The EV_SET macro is used to initialise the struct kevent. The 1st argument is the address of the event structure to initialise it. The 2nd argument is the file descriptor we wish to monitor. The 3rd argument is the type event we wish to monitor. In this case we want to monitor the file underlying our file descriptor, which is this EVFILT_VNODE event. The 5th argument is some filter specific flags, in this case NOTE_WRITE, which means we want to get an event when the file is modified. The 4th argument describes what action to perform when the event happens. In particular we want the event added to the queue, so we use EV_ADD & EV_CLEAR. The EV_ADD is obvious, but EV_CLEAR less so. The NOTE_WRITE event is triggered by the first write to the file after register, and remains set. This means that you continue to receive the event indefinitately. By using the EV_CLEAR flag, the state is reset, so that an event is only delivered once for each write. (Actually it could be less than once per write, since events are coalesced.) The final arguments are data values, which aren't used for our event.

The kevent system call actually registers the event we initialised with EV_SET. The kevent function takes a the kqueue descriptor as the 1st argument. The 2nd and 3rd arguments are a list of events to register (pointer and length). In this case we register the event we just initialised. The 4th and 5th arguments is a list of events to receive (in this case empty). The final argument is a timeout, which is not relevent in this case (as we aren't receiving any events).

    struct kevent ke;
           /* the file we are monitoring */ fd,
           /* we monitor vnode changes */ EVFILT_VNODE,
           /* when the file is written add an event, and then clear the
              condition so it doesn't re- fire */ EV_ADD | EV_CLEAR,
           /* just care about writes to the file */ NOTE_WRITE,
           /* don't care about value */ 0, NULL);
    r = kevent(kq, /* register list */  &ke, 1, /* event list */  NULL, 0, /* timeout */ NULL);
    if (r == -1) {
        err(1, "kevent failed");

After we have registered our event we go into an infinite loop receiving events. In this time we aren't setting up any events, so it is the list to register is simply NULL. But, 4th and 5th argument have a list of up to 1 item to receive. In this case we still don't want a timeout. We want to check that the event we received was what expected, so we assert it is true.

        r = kevent(kq,
                   /* register list */ NULL, 0,
                   /* event list */ &ke, 1,
                   /* timeout */ NULL);
        if (r == -1) {
            err(1, "kevent");
        assert(ke.filter == EVFILT_VNODE && ke.fflags & NOTE_WRITE);

The aim of this program is to run a shell command whenever a file changes. Simply getting the write is not good enough. A progam that is updating a file will cause a number of consecutive writes, and since it is likely that our shell command is going to want to operate on a file that is a consistent state, we want to try and ensure the file is at a quiescent point. UNIX doesn't really provide a good way of doing this. Well, actually, there is a bunch of file locking APIs, but I guess I haven't really used them much, and it isn't clear if the file writing would be using them, and as far as I can tell, the writing file would have had to be written using the same locking mechanism. Also, the commands I want to run are only going to be reading the file, not writing to it, so at worst I'm going to end up with some broken output until the next write. Anyway, to get something that will work almost all the time, I've implemented a simply debouncing technique. It is a simple loop that waits until the file is not written to for 0.5 seconds. 0.5 seconds is a good tradeoff between latency and ensuring the file is quiescent. Of course it is far from ideal, but it will do.

To implement this a struct timespec object is created to pass as the timeout parameter to kevent.

struct timespec debounce_timeout;
    /* Set debounce timeout to 0.5 seconds */
    debounce_timeout.tv_sec = 0;
    debounce_timeout.tv_nsec = 500000000;

In the debounce loop, kevent is used, but this time passed with the 0.5 second timeout.

        /* debounce */
        do {
            r = kevent(kq,
                   /* register list */ NULL, 0,
                   /* event list */ &ke, 1,
                   /* timeout */ &debounce_timeout);
            if (r == -1) {
                err(1, "kevent");
        } while (r != 0);

Finally after the debounce, we run the command that the user specified on the command line. The following code shows the declaration, initialisation and execution of the command.

char *command;
    command = argv[2];

To use simplefilemon is easy. E.g: simplefilemon filename "command to run".

You can compile simplemon.c with gcc simplefilemon.c -o simplefilemon.

Download: simplefilemon.c

pexif 0.11 released

Thu, 27 Mar 2008 13:22:11 +0000
pexif code python tech

I released a new version of pexif today. This release fixes some small bugs and now deals with files containing multiple application markers. This means files that have XMP metadata now work.

Now I just wish I had time to actually use it for its original purpose of attaching geo data to my photos.

seek support in pyannodex

Sun, 27 Aug 2006 15:48:22 +0000
annodex python code tech

One of the problems with pyannodex was that you could only iterate through a list of clips once. That is in something like:

anx = annodex.Reader(sys.argv[1])

for clip in anx.clips:
    print clip.start

for clip in anx.clips:
    print clip.start

Only the first loop would print anything. This is basically because clips returned an iterator, and once the iterator had run through once, it didn't reset the iterator. I had originally (in solved this in a really stupid way, whereby I used class properties to recreate an iterator object each time a clip object was returned. This was obviously sily and I fixed it properly by reseting the file handle in the __iter__ method of my iterator object.

When reviewing this code I also found a nasty bug. The way my iterator worked relied on each read call causing it most one callback, which wasn't actually what was happening. Luckily this is also fixable by having callback functions return ANX_STOP_OK, rather than ANX_CONTINUE. Any way, there is now a new version up which fixes these problems.

On robust scripts

Thu, 08 Jun 2006 11:01:54 +0000
tech code python

When you write scripts that run as cron jobs, and send email to people, and have the potential to send a *lot* of email to people you really don't want to screw up.

Unfortunately I did screw up when writing one of these. It was a pretty simple 200 lines or so python script that would find any new revisions that had been commited since the last time it ran, and email commit messages to developers.

The idea was simple, a file kept a list of the last seen revisions, I would go through the archive find new revisions, mail them out, and then finally write out the file with the latest files.

Spot the bug, or at least the design error? When our server ran out of disk space, the stage of writing out the the file with the last seen revisions failed, and created an empty file. So next time the script ran it thought all the revisions were new, resulting in thousands of email about revisions committed years ago. I pity our poor sysadmin who not only had to deal with out of disk problems but now also with a mail queue with thousands of messages.

Solution to the problem of course is try and write out the new revision file before sending the email, and write it to a temporary file, instead of blasting the last known good out of existance by writing directly over the top of it.

I guess the moral is designing these little scripts actually requires more care than I usually give them.


Mon, 29 May 2006 18:05:52 +0000
tech code

I do a lot of hacking with binary files at work, and simply doing a diff to check if two files are the same isn't really useful, I usually want to preprocess the file with some command to make it into useful plain text. This ends up being quite tedious by hand. I came up with this simple script, diffcmd, but I'm sure the lazyweb will let me know of anything better if it exists.

Usage: % diffcmd "readelf -a" file1 file2



diff -u $TMP1 $TMP2

rm $TMP1 $TMP2

Sun, 12 Mar 2006 09:50:24 +0000
tech maps python code

Today I release the first version of nswgeo. It is a simple python script that queries the NSW Department of Lands GeoSpatialPortal to find the location of addresses in NSW.