Video - Porting OKL4 to a new SoC

Thu, 29 May 2008 18:45:49 +0000
tech article code okl4

Earlier this year I presented at the linux.conf.au embedded miniconf, about how to port OKL4 to a new SoC. The video was taped and had until recently been available on the linux.conf.au 2008 website, but for some reason that website has gone awol, so I thought it was a good time to put up my own copy. These videos have the advantage that they have gone through a painstaking post-production phase, which seamlessly meld the slides into the video (well, not quite seamless), and also all the bad bits have been removed.

This presentation gives a really good overview of what is involved in porting OKL4 to a new SoC. However, please note that the specific APIs have been somewhat simplified for pedagogical reasons, so this is more an introduction to the concepts, rather than a tutorial as such.

The videos are available in Ogg/Theora and also Quicktime/H264 formats, and either CIF (352x288) or PAL (720x576). If you can afford the bandwidth I would recommend hi-res ones, as then you can actually see what is one the screen.

Musings on literate programming

Wed, 21 May 2008 15:35:14 +0000
tech article programming

You know that any blog post with musings in the title is going to be a lot of navel-gazing babble, so I don’t blame you if you skip out now, this is mostly for me to consolidate my thoughts on literate programming.

The idea behind literate programming that you should write programs with a human reader as the primary audience, and a compiler as the secondary audience. So this means that you organise your program in logical orders that aid explanation of the program; think chapters and sections, rather than organising your program in a way that is oriented towards the compilers; think files and functions.

Some of the outcomes of writing your programs in this literate manner is that you think a lot more about how to explain things to another programmer (who is your audience), than if you are writing with the compiler as your audience. I’m quite interested in things that can improve the quality of my code personally, and of my team’s code. So I thought I’d try it out.

I first tried a somewhat traditional tool, called noweb. I took a fairly complex module of a kernel that I’m writing as the base for this. The output that I produced from this was some quite nice look LaTeX, that I think did a good job of explaining the code, as well as some of the design decisions that might have otherwise been difficult to communicate to another programmer. I was able to structure my prose in a way that I thought was quite logical for presentation of the ideas, but ended up being quite different to the actual structure of the original code. It is no surprise that the tool to take the source file, and generate source files to be used by the compile is called tangle. Unfortunately I can’t really share the output of this experiment as the code is closed (at the moment).

While I liked the experience of using noweb, it seem a lot like a case of write the code, then write the documentation, and then going back to modify the code would be a real nightmare. There is a lot of evidence (i.e: working bodies of code) that a body of code can be worked on by multiple people at once reasonably effectively. I’m yet to see a piece of literature that can be effectively modified by multiple parties. (And no, Wikipedia doesn’t count).

One person who agrees is Zed Shaw. He agreed so much that he made his own tool, Idiopidae that allowed you to have code, but then separately create documentation describing that code, in a mostly literate manner, in a separate file. This seemed like a good altnernative, and I tried it out when documenting simplefilemon. Here the documentation is separate, but the code has markers in it so that the documentation can effectively refer to blocks of code, which goes some way to eliminating the problems of traditional literate programming. For a start syntax hi-lighting actually worked! (Yes, emacs has a way of doing dual major modes, but none of them really worked particularly well). When doing this approach, I have to admit it felt less literate, which is pretty wishy-washy, but I felt more like I was documenting the code, rather than really taking a holistic approach to explaining the program. Of course, that isn’t exactly a good explanation, but it definitely felt different to the other approach. Maybe I felt dirty because I wasn’t following the Knuth religion to the letter. I think this approach probably has more legs, but I did end up with a lot of syntactic garbage in the source file, which made it more difficult to read than it should have been. Also I couldn’t find a good way of summarising large chunks of code. So for example, I could present code for a loop with the body replaced by a reference to another block of code, which is one of the nice things I could do in noweb. Of course that is probably something that can be added to the tool in the future, and isn’t really the end of the world.

Where to go next? Well, I think I’m going to try and go back and reproduce my original kernel documentation using Idiopidae to see what the experience is when only modifying one variable (the tool), and see how that goes. If that can produce something looking reasonably good, I think I might invest some time in extending Idiopidae to get it working exactly how I want it to.

VMware fusion, hard links, and zsh

Wed, 21 May 2008 11:52:57 +0000
article tech osx

While I end up using Mac OS X as my primary GUI, I still do a lot of development work on Linux. I'm using VMware Fusion to host a virtual headless Linux machine, which is all good. Recently I decided to upgrade my OS to Ubuntu 8.04, which promotes have a just-enough OS (jeOS), which seemed perfect for what I wanted to do. Unfortunately the process of getting the VMware client tools installed was less than simple. Cut a long story short, the fix is described by Peter Coooper, and things work well after that. (It is a little annoying that the Ubuntu documentation doesn't explain this, or link to this.).

Anyway, after this I'm able to share my home directory directly between OS X, and my virtual machine, which is absolutely fantastic, as I'm not using TRAMP or some network filesystem to shuffle files back and forth between the virtual machine and the main machine.

Unfortunately, I ran into a bit of a problem, specifically, history was not working in zsh. Specifically saving the history into the history file was not working, which is a really painful situation. It was not really clear why that was, running fc -W manually didn't work either, but managed to fail silently, no stderr output, and no error code returned. Failing this I went back to the massively useful debugging tool strace. This finally gave me the clue that link() (hard linking) was failing. I confirmed that using ln.

So, it turns out that the VMware hgfs file system doesn't support hard linking, which is a real pain, especially since the underlying OS X file system supports hard linking. So I'm down to the work around of storing my history file in /tmp rather than my home directory, which is slightly annoying, but not the end of the world.

As it turns out I'm not the first to discover this, Roger C. Clermont also found this out a few days ago. With any luck we will find a solution in the near future.

Simple File Monitoring on Mac OS X

Thu, 15 May 2008 02:21:22 +0000
tech code article osx

Mac OS X has the kevent() system call which allows you to monitor various kernel events. This is kind of useful, because I want to, well, watch a file, and then do something when it changes. Now, I would have thought I could find something really simple to do this, but I could only find massive GUI programs programs to do this, which is not so great for scripting.

Anyway, long story short, I decided to write my own. It was pretty straight forward. I thought it was worth documenting how it works so that Benno 5 years from now can remember how to use kevent.

The first important thing you need to create a kernel queue using the kqueue() system call. This system call returns a descriptor which allows you get to use on calls to kevent(). These descriptors come out of the file descriptor namespace, but don't actually get inherited on fork().

19
int kq;
77
78
79
80
    kq = kqueue();
    if (kq == -1) {
        err(1, "kq!");
    }

After creating the kernel queue, an event is register. The EV_SET macro is used to initialise the struct kevent. The 1st argument is the address of the event structure to initialise it. The 2nd argument is the file descriptor we wish to monitor. The 3rd argument is the type event we wish to monitor. In this case we want to monitor the file underlying our file descriptor, which is this EVFILT_VNODE event. The 5th argument is some filter specific flags, in this case NOTE_WRITE, which means we want to get an event when the file is modified. The 4th argument describes what action to perform when the event happens. In particular we want the event added to the queue, so we use EV_ADD & EV_CLEAR. The EV_ADD is obvious, but EV_CLEAR less so. The NOTE_WRITE event is triggered by the first write to the file after register, and remains set. This means that you continue to receive the event indefinitately. By using the EV_CLEAR flag, the state is reset, so that an event is only delivered once for each write. (Actually it could be less than once per write, since events are coalesced.) The final arguments are data values, which aren't used for our event.

The kevent system call actually registers the event we initialised with EV_SET. The kevent function takes a the kqueue descriptor as the 1st argument. The 2nd and 3rd arguments are a list of events to register (pointer and length). In this case we register the event we just initialised. The 4th and 5th arguments is a list of events to receive (in this case empty). The final argument is a timeout, which is not relevent in this case (as we aren't receiving any events).

59
    struct kevent ke;
83
84
85
86
87
88
89
90
91
92
93
94
    EV_SET(&ke,
           /* the file we are monitoring */ fd,
           /* we monitor vnode changes */ EVFILT_VNODE,
           /* when the file is written add an event, and then clear the
              condition so it doesn't re- fire */ EV_ADD | EV_CLEAR,
           /* just care about writes to the file */ NOTE_WRITE,
           /* don't care about value */ 0, NULL);
    r = kevent(kq, /* register list */  &ke, 1, /* event list */  NULL, 0, /* timeout */ NULL);
    
    if (r == -1) {
        err(1, "kevent failed");
    }

After we have registered our event we go into an infinite loop receiving events. In this time we aren't setting up any events, so it is the list to register is simply NULL. But, 4th and 5th argument have a list of up to 1 item to receive. In this case we still don't want a timeout. We want to check that the event we received was what expected, so we assert it is true.

33
34
35
36
37
38
39
40
        r = kevent(kq,
                   /* register list */ NULL, 0,
                   /* event list */ &ke, 1,
                   /* timeout */ NULL);
        if (r == -1) {
            err(1, "kevent");
        }
        assert(ke.filter == EVFILT_VNODE && ke.fflags & NOTE_WRITE);

The aim of this program is to run a shell command whenever a file changes. Simply getting the write is not good enough. A progam that is updating a file will cause a number of consecutive writes, and since it is likely that our shell command is going to want to operate on a file that is a consistent state, we want to try and ensure the file is at a quiescent point. UNIX doesn't really provide a good way of doing this. Well, actually, there is a bunch of file locking APIs, but I guess I haven't really used them much, and it isn't clear if the file writing would be using them, and as far as I can tell, the writing file would have had to be written using the same locking mechanism. Also, the commands I want to run are only going to be reading the file, not writing to it, so at worst I'm going to end up with some broken output until the next write. Anyway, to get something that will work almost all the time, I've implemented a simply debouncing technique. It is a simple loop that waits until the file is not written to for 0.5 seconds. 0.5 seconds is a good tradeoff between latency and ensuring the file is quiescent. Of course it is far from ideal, but it will do.

To implement this a struct timespec object is created to pass as the timeout parameter to kevent.

21
struct timespec debounce_timeout;
72
73
74
    /* Set debounce timeout to 0.5 seconds */
    debounce_timeout.tv_sec = 0;
    debounce_timeout.tv_nsec = 500000000;

In the debounce loop, kevent is used, but this time passed with the 0.5 second timeout.

41
42
43
44
45
46
47
48
49
50
        /* debounce */
        do {
            r = kevent(kq,
                   /* register list */ NULL, 0,
                   /* event list */ &ke, 1,
                   /* timeout */ &debounce_timeout);
            if (r == -1) {
                err(1, "kevent");
            }
        } while (r != 0);

Finally after the debounce, we run the command that the user specified on the command line. The following code shows the declaration, initialisation and execution of the command.

20
char *command;
70
    command = argv[2];
51
        system(command);

To use simplefilemon is easy. E.g: simplefilemon filename "command to run".

You can compile simplemon.c with gcc simplefilemon.c -o simplefilemon.

Download: simplefilemon.c

At CELF and Embedded Systems Conference this week

Mon, 14 Apr 2008 16:30:31 +0000
tech linux embedded celf photos

I'm in San Jose at the moment for both the CELF Embedded Linux Conference and the Embedded Systems Conference (ESC). (Which are conveniently scheduled at the same time, in different places!). I'm not quite sure how much of each I'll see. I'm primarily going to at CELF, but will probably end up playing some time as booth babe at the Open Kernel Labs stand at ESC.

Most importantly there will be beer at Gordon Biersch (San Jose) on Tuesday night from around 7pm. (Not Thursday night as I may have told people previously, of course if anyone wants to meet up on Thursday as well, that works too.)

I did manage to take a quick break from work yesterday and took advantage of the awesome weather in northern California to drive down to Big Sur along Highway 1. It was some pretty spectacular scenery. Hopefully I won't have a sprained ankle next time and will be able to do some hiking.

Big Sur

ESC seems to bring out some fun billboards, such as this one that I saw driving outside my hotel room today.

tightly couple and fully integrated

pexif 0.11 released

Thu, 27 Mar 2008 13:22:11 +0000
pexif code python tech

I released a new version of pexif today. This release fixes some small bugs and now deals with files containing multiple application markers. This means files that have XMP metadata now work.

Now I just wish I had time to actually use it for its original purpose of attaching geo data to my photos.

On apostrophes, Unicode and XML

Thu, 28 Feb 2008 12:51:39 +0000
artcile tech unicode web

So, I started with something reasonably straight-forward — update my blog posts so that the <title> tag is set correctly — which quickly led me down the rabbit hole of typographically correct apostrophes, Unicode, XML, encodings, keyboards and input methods. Updating my blog software took about 15 minutes, delving down the rabbit hole took about 5 hours.

So, the apostrophe. This isn’t about the correct usage of the apostrophe. This is entirely about correctly typesetting of the apostrophe. Now there are lots of opinions on the subject. It basically comes down to the choice between ASCII character 0x27 and Unicode code point U+2019. Of course it just so happens that ASCII character 0x27 is also Unicode code point U+0027, so really, this comes down to a discussion about which Unicode code point is most appropriate for representing the apostrophe. After way too much searching, it actually turns out to be a really simple decision. Unicode provides the documentation for the code points in a series of charts. The chart C0 Controls and Basic Latin (pdf) documents the APOSTROPHE. It is described as:

0027 ' APOSTROPHE 
= apostrophe-quote (1.0) 
= APL quote 
• neutral (vertical) glyph with mixed usage 
• 2019 ’  is preferred for apostrophe 

So, despite the fact that it is named APOSTROPHE, it is described as a neutral (vertical) glyph with mixed usage, and it notes that U+2019 is the preferred code point for apostrophe. This looks pretty conclusive but let’s check the General Punctuation chart:

2019 ’ RIGHT SINGLE QUOTATION MARK 
= single comma quotation mark 
• this is the preferred character to use for apostrophe

So, my conclusion is that the most appropriate character for an apostrophe is U+2019. OK, great, now I have to decide how I can actually encode this. I’m used to writing plain ASCII text documents, and U+2019 is not something I can represent in ASCII. So, since I’m mostly concerned about document I’m publishing on the interwebs, and I figured that character entities refernences would be the way to go. So there appears to be a relevant entity:

<!ENTITY rsquo   CDATA "’" -- right single quotation mark, U+2019 ISOnum -->

Of course is seems a little odd using &rsquot; to represent an apostrophe, but so be it. Now in XML a new character entity is defined &apos;, which you might on first glance think is exactly what you want, but on second glance, it isn’t, since it maps to U+0027, not U+2019. &apos; is mostly used for escaping strings which are enclosed in actual ' characters. So, &apos; is out. XML itself only defines character entities for ampersand, less-than, greater-than, quotation mark, and apostrophe. XHTML however defines the rest of the character entities that you have come to love and expect from HTML, so &rsquot; is still in, as long as it is used in an XHTML document, not a general XML document.

So I was set on just using &rsquot;, and I sent my page off to the validator. This went fine, except it pedantically pointed out that I had not defined a character encoding, and really I should. Damn, now I need to think about character encoding too. OK, so what options are there? Well, IANA, has a nice list of official names for characters sets that may be used in the internet.

ANSI_X3.4-1968 (a.k.a US-ASCII, a.k.a ASCII) had to be a big first contender, since that is basically what I had been doing for many a year, but to be honest, this seemed a little backwards. The idea of having to use numeric character references (NCRs) everytime I wanted an apostrophe seemed a little silly. Besides the W3C recommends using

an encoding that allows you to represent the characters in their normal form, rather than using character entities or NCRs

OK, so since XML spec defines that:

A character reference refers to a specific character in the ISO/IEC 10646 character set

it seems that I really should choose an encoding that can directly encode Unicode code points. (The Unicode standard and ISO/IEC 10646 track each other.) So, what options are there for encoding Unicode? Well it seem that one of the Unicode transformation formats would be a good choice. But there are so many to choose from, UTF-8, UTF-16, UTF-32, even UTF-9. While UTF-9 was definitely a contender, UTF-8 seems to sanest thing for me. For a start it seems to just-work™ in my editor. So, going with UTF-8, I still end up needing to let other people know my files are encoded in UTF-8. There appears to be a few options for doing this, and the article goes into a long explanation of the various different pros and cons. In the end, I just put it into the XML prolog.

Of course the final piece of the puzzle is actually inputing characters. OS X seems to have fairly good support for this. If you poke around a bit in internationalisation support in system preferences and enable Show Keyobard Viewer, Show Character Viewer and Unicode Hex Input, you should be able to works things out.

So, I can now have lovely typographically correct apostrophes and they work great, and all is good with the world. (Except of course that this page probably renders like crap in Internet Explorer. Oh well.)

Emacs backup files

Tue, 26 Feb 2008 18:47:45 +0000
tech article emacs

The backup files that emacs litters your filesystem with can be a real pain. Stupid tilde files can be annoying and dangerous. Especially since ~ does double duty of being a short cut for your home directory. (I can't be the only person who has accidently typed rm -fr *~ as rm -fr * ~).

Anyway, the easy solution is to add this to your config file:

(setq backup-directory-alist '(("" . "~/.emacs.d/emacs-backup")))

Are microkernels hardware abstraction layers?

Mon, 25 Feb 2008 16:58:27 +0000
tech article microkernel okl4

In a recent post Gernot made a comparison between nanokernels and hardware abstraction layers (HALs). This prompted a question on the OKL4 developers mailing list: well, couldn’t you consider a microkernel a HAL?.

I think the logical conclusion, both theoretical and practical, is a resounding, no.

Why? Well, a microkernel is, in theory (if not always in practise) minimal. That is, it should only include things in the kernel are those pieces of code that must run in privileged mode.

So, if a microkernel was to provide any hardware abstractions, it will only be providing the abstraction that have to be in the kernel. Which really falls short of a complete hardware abstraction layer.

Venn diagramm shoing overlap between HAL and microkernel properties

Now, probably the more interesting question is should the microkernel provide any hardware abstraction, and if so what hardware should it be abstracting, and what is the right abstraction. After starting to write some answers to these questions I reminded myself of the complexity involved in answering them, so I will leave these questions hanging for another post.

linux.conf.au 2008 Day 1

Mon, 28 Jan 2008 19:01:51 +0000
tech lca linux android

After two weeks in California, I spent two days in Sydney, before flying down the sunny Melbourne yesterday for linux.conf.au 2008.

Monday and Tuesday at linux.conf.au are the miniconf days. The wide variety of topics on display make things a little difficult. I was back and forward between the embedded and virtualisation mini-confs.

I gave two presentations today, the first this morning was on how to port OKL4 to a new system-on-a-chip. The chip in question is the virtual Goldfish SoC, which forms the core of the emulated platform in Google's Android SDK.

The second presentation on a more high-level talk on why virtualisation is a useful technology not just for large data-centers and server applications, but also for embedded systems, such a mobile phone handsets.

Unfortunately because I was presenting, I didn't really have much time to focus on some of the other great presentations that went on today. With any luck I'll be able more attentively attend some talks tomorrow.

For those interested, the talks were filmed, so hopefully videos will be up online in the near future.