pexif is the python library for editing an image’s EXIF data. Somewhat embarrassingly, the last release I made (0.12) had a really stupid bug in it. This has now been rectified, and a new version (0.13) is now available.
I’ve, recently started using memoize.py,
as the core of my build system for a new project I’m working on. This
simplicity involved is pretty neat. Rather than manually needing to
work out the dependencies, (or having specialised tools for
determining the dependencies), with memoize.py, you
simply write the commands you need to build your project, and
memoize.py works out all the dependencies for you.
So, what’s the catch? Well, the way memoize.py works
is by using strace to
record all the system calls that a program makes during its
execution. By analyzing this list memoize.py can work out
all the files that are touched when a command is run, and then stores
this as a list of dependencies for that command. Then, the next time
you run the same command memoize.py first checks to see
if any of the dependencies have change (using either md5sum, or
timestamp), and only runs the command if any of the dependencies have
changed. So the catch of course is that this only runs on Linux (as
far as I know, you can’t get strace anywhere else, although that
doesn’t mean the same techniques couldn’t be used with a different
underlying system call tracing tool).
This technique is quite a radical difference to other tools which
determine a large dependency graph of the entire build, and then,
recursively work through this graph to fulfil unmet dependencies. As
a result this form is a lot more imperative, rather than declarative
style. Traditional tools (SCons, make, etc), provide a language which
allows you to essentially describe a dependency graph, and then the
order in which things are executed is really hidden inside the tool.
Using memoize.py is a lot different. You go through
defining the commands you want to run (in order!), and that is
basically it.
Some of the advantages of this approach are:
There are however some disadvantages:
ptrace to perform the system call tracing.memoize.py
a little so that you could simply choose not to run strace. Obviously
you can’t determine dependencies in this case, but you can at least build the
thing.As with may good tools in your programming kit, memoize.py is
available under a very liberal BSD style license, which is nice, because I’ve
been able to fix up some problems and add some extra functionality. In particular
I’ve added options to:
The patch and full file are available. These have of course been provided upstream, so with any luck, some or most of them will be merged upstream.
So, if you have a primarily Linux project, and want to try something
different to SCons, or make, I’d recommend considering memoize.py.
Earlier this year I presented at the linux.conf.au embedded miniconf, about how to port OKL4 to a new SoC. The video was taped and had until recently been available on the linux.conf.au 2008 website, but for some reason that website has gone awol, so I thought it was a good time to put up my own copy. These videos have the advantage that they have gone through a painstaking post-production phase, which seamlessly meld the slides into the video (well, not quite seamless), and also all the bad bits have been removed.
This presentation gives a really good overview of what is involved in porting OKL4 to a new SoC. However, please note that the specific APIs have been somewhat simplified for pedagogical reasons, so this is more an introduction to the concepts, rather than a tutorial as such.
The videos are available in Ogg/Theora and also Quicktime/H264 formats, and either CIF (352x288) or PAL (720x576). If you can afford the bandwidth I would recommend hi-res ones, as then you can actually see what is one the screen.
Mac OS X has the kevent()
system call which allows you to monitor various kernel events. This is kind of
useful, because I want to, well, watch a file, and then do something when it
changes. Now, I would have thought I could find something really simple to
do this, but I could only find massive GUI programs programs to do this, which
is not so great for scripting.
Anyway, long story short, I decided to write my own. It was pretty straight
forward. I thought it was worth documenting how it works so that Benno 5 years
from now can remember how to use kevent.
The first important thing you need to create a kernel queue using
the kqueue()
system call. This system call returns a descriptor which allows you
get to use on calls to kevent(). These descriptors come
out of the file descriptor namespace, but don't actually get inherited
on fork().
19 | int kq;
|
77 78 79 80 | kq = kqueue();
if (kq == -1) {
err(1, "kq!");
}
|
After creating the kernel queue, an event is register. The
EV_SET macro is used to initialise the struct
kevent. The 1st argument is the address of the event structure
to initialise it. The 2nd argument is the file descriptor we wish to
monitor. The 3rd argument is the type event we wish to monitor. In
this case we want to monitor the file underlying our file descriptor,
which is this EVFILT_VNODE event. The 5th argument is
some filter specific flags, in this case NOTE_WRITE,
which means we want to get an event when the file is modified. The
4th argument describes what action to perform when the event
happens. In particular we want the event added to the queue, so we use
EV_ADD & EV_CLEAR. The EV_ADD is obvious,
but EV_CLEAR less so. The NOTE_WRITE event
is triggered by the first write to the file after register, and
remains set. This means that you continue to receive the event
indefinitately. By using the EV_CLEAR flag, the state is
reset, so that an event is only delivered once for each
write. (Actually it could be less than once
per write, since events are coalesced.) The final arguments
are data values, which aren't used for our event.
The kevent system call actually registers the
event we initialised with EV_SET. The kevent
function takes a the kqueue descriptor as the 1st argument. The 2nd and
3rd arguments are a list of events to register (pointer and length). In this
case we register the event we just initialised. The 4th and 5th arguments
is a list of events to receive (in this case empty). The final argument
is a timeout, which is not relevent in this case (as we aren't
receiving any events).
59 | struct kevent ke;
|
83 84 85 86 87 88 89 90 91 92 93 94 | EV_SET(&ke,
/* the file we are monitoring */ fd,
/* we monitor vnode changes */ EVFILT_VNODE,
/* when the file is written add an event, and then clear the
condition so it doesn't re- fire */ EV_ADD | EV_CLEAR,
/* just care about writes to the file */ NOTE_WRITE,
/* don't care about value */ 0, NULL);
r = kevent(kq, /* register list */ &ke, 1, /* event list */ NULL, 0, /* timeout */ NULL);
if (r == -1) {
err(1, "kevent failed");
}
|
After we have registered our event we go into an infinite loop
receiving events. In this time we aren't setting up any events,
so it is the list to register is simply NULL. But,
4th and 5th argument have a list of up to 1 item to receive.
In this case we still don't want a timeout. We want to check
that the event we received was what expected, so we assert
it is true.
33 34 35 36 37 38 39 40 | r = kevent(kq,
/* register list */ NULL, 0,
/* event list */ &ke, 1,
/* timeout */ NULL);
if (r == -1) {
err(1, "kevent");
}
assert(ke.filter == EVFILT_VNODE && ke.fflags & NOTE_WRITE);
|
The aim of this program is to run a shell command whenever a file
changes. Simply getting the write is not good enough. A
progam that is updating a file will cause a number of consecutive
writes, and since it is likely that our shell command is
going to want to operate on a file that is a consistent state, we want
to try and ensure the file is at a quiescent point. UNIX doesn't
really provide a good way of doing this. Well, actually, there is a
bunch of file locking APIs, but I guess I haven't really used them
much, and it isn't clear if the file writing would be using them, and
as far as I can tell, the writing file would have had to be written
using the same locking mechanism. Also, the commands I want to run are
only going to be reading the file, not writing to it, so at worst I'm
going to end up with some broken output until the next write. Anyway,
to get something that will work almost all the time, I've implemented
a simply debouncing technique. It is a simple loop that waits until
the file is not written to for 0.5 seconds. 0.5 seconds is a good tradeoff
between latency and ensuring the file is quiescent. Of course it is
far from ideal, but it will do.
To implement this a struct timespec object is created
to pass as the timeout parameter to kevent.
21 | struct timespec debounce_timeout;
|
72 73 74 | /* Set debounce timeout to 0.5 seconds */
debounce_timeout.tv_sec = 0;
debounce_timeout.tv_nsec = 500000000;
|
In the debounce loop, kevent is used, but this time
passed with the 0.5 second timeout.
41 42 43 44 45 46 47 48 49 50 | /* debounce */
do {
r = kevent(kq,
/* register list */ NULL, 0,
/* event list */ &ke, 1,
/* timeout */ &debounce_timeout);
if (r == -1) {
err(1, "kevent");
}
} while (r != 0);
|
Finally after the debounce, we run the command that the user specified on the command line. The following code shows the declaration, initialisation and execution of the command.
20 | char *command;
|
70 | command = argv[2];
|
51 | system(command);
|
To use simplefilemon is easy. E.g: simplefilemon filename "command to run".
You can compile simplemon.c with gcc simplefilemon.c -o simplefilemon.
Download: simplefilemon.c
I released a new version of pexif today. This release fixes some small bugs and now deals with files containing multiple application markers. This means files that have XMP metadata now work.
Now I just wish I had time to actually use it for its original purpose of attaching geo data to my photos.
One of the problems with pyannodex was that you could only iterate through a list of clips once. That is in something like:
anx = annodex.Reader(sys.argv[1])
for clip in anx.clips:
print clip.start
for clip in anx.clips:
print clip.start
Only the first loop would print anything. This is basically because
clips returned an iterator, and once the iterator had run through once, it
didn't reset the iterator. I had originally (in 0.7.3.1) solved this in
a really stupid way, whereby I used class properties to recreate an
iterator object each time a clip object was returned. This was obviously
sily and I fixed it properly by reseting the file handle in the __iter__
method of my iterator object.
When reviewing this code I also found a nasty bug. The way my iterator worked relied on each read call causing it most one callback, which wasn't actually what was happening. Luckily this is also fixable by having callback functions return ANX_STOP_OK, rather than ANX_CONTINUE. Any way, there is now a new version up which fixes these problems.
When you write scripts that run as cron jobs, and send email to people, and have the potential to send a *lot* of email to people you really don't want to screw up.
Unfortunately I did screw up when writing one of these. It was a pretty simple 200 lines or so python script that would find any new revisions that had been commited since the last time it ran, and email commit messages to developers.
The idea was simple, a file kept a list of the last seen revisions, I would go through the archive find new revisions, mail them out, and then finally write out the file with the latest files.
Spot the bug, or at least the design error? When our server ran out of disk space, the stage of writing out the the file with the last seen revisions failed, and created an empty file. So next time the script ran it thought all the revisions were new, resulting in thousands of email about revisions committed years ago. I pity our poor sysadmin who not only had to deal with out of disk problems but now also with a mail queue with thousands of messages.
Solution to the problem of course is try and write out the new revision file before sending the email, and write it to a temporary file, instead of blasting the last known good out of existance by writing directly over the top of it.
I guess the moral is designing these little scripts actually requires more care than I usually give them.
I do a lot of hacking with binary files at work, and simply doing a diff to check if two files are the same isn't really useful, I usually want to preprocess the file with some command to make it into useful plain text. This ends up being quite tedious by hand. I came up with this simple script, diffcmd, but I'm sure the lazyweb will let me know of anything better if it exists.
Usage: % diffcmd "readelf -a" file1 file2
#!/bin/sh CMD=$1 FILE1=$2 FILE2=$3 TMP1=`tempfile` TMP2=`tempfile` $CMD $FILE1 > $TMP1 $CMD $FILE2 > $TMP2 diff -u $TMP1 $TMP2 rm $TMP1 $TMP2
Today I release the first version of nswgeo. It is a simple python script that queries the NSW Department of Lands GeoSpatialPortal to find the location of addresses in NSW.