Earlier this year I presented at the linux.conf.au embedded miniconf, about how to port OKL4 to a new SoC. The video was taped and had until recently been available on the linux.conf.au 2008 website, but for some reason that website has gone awol, so I thought it was a good time to put up my own copy. These videos have the advantage that they have gone through a painstaking post-production phase, which seamlessly meld the slides into the video (well, not quite seamless), and also all the bad bits have been removed.
This presentation gives a really good overview of what is involved in porting OKL4 to a new SoC. However, please note that the specific APIs have been somewhat simplified for pedagogical reasons, so this is more an introduction to the concepts, rather than a tutorial as such.
The videos are available in Ogg/Theora and also Quicktime/H264 formats, and either CIF (352x288) or PAL (720x576). If you can afford the bandwidth I would recommend hi-res ones, as then you can actually see what is one the screen.
You know that any blog post with musings in the title is going to be a lot of navel-gazing babble, so I don’t blame you if you skip out now, this is mostly for me to consolidate my thoughts on literate programming.
The idea behind literate programming that you should write programs with a human reader as the primary audience, and a compiler as the secondary audience. So this means that you organise your program in logical orders that aid explanation of the program; think chapters and sections, rather than organising your program in a way that is oriented towards the compilers; think files and functions.
Some of the outcomes of writing your programs in this literate manner is that you think a lot more about how to explain things to another programmer (who is your audience), than if you are writing with the compiler as your audience. I’m quite interested in things that can improve the quality of my code personally, and of my team’s code. So I thought I’d try it out.
I first tried a somewhat traditional tool, called noweb. I took a fairly complex module of a kernel that I’m writing as the base for this. The output that I produced from this was some quite nice look LaTeX, that I think did a good job of explaining the code, as well as some of the design decisions that might have otherwise been difficult to communicate to another programmer. I was able to structure my prose in a way that I thought was quite logical for presentation of the ideas, but ended up being quite different to the actual structure of the original code. It is no surprise that the tool to take the source file, and generate source files to be used by the compile is called tangle. Unfortunately I can’t really share the output of this experiment as the code is closed (at the moment).
While I liked the experience of using noweb, it seem a lot like a case of write the code, then write the documentation, and then going back to modify the code would be a real nightmare. There is a lot of evidence (i.e: working bodies of code) that a body of code can be worked on by multiple people at once reasonably effectively. I’m yet to see a piece of literature that can be effectively modified by multiple parties. (And no, Wikipedia doesn’t count).
One person who agrees is Zed Shaw. He agreed so much that he made his own tool, Idiopidae that allowed you to have code, but then separately create documentation describing that code, in a mostly literate manner, in a separate file. This seemed like a good altnernative, and I tried it out when documenting simplefilemon. Here the documentation is separate, but the code has markers in it so that the documentation can effectively refer to blocks of code, which goes some way to eliminating the problems of traditional literate programming. For a start syntax hi-lighting actually worked! (Yes, emacs has a way of doing dual major modes, but none of them really worked particularly well). When doing this approach, I have to admit it felt less literate, which is pretty wishy-washy, but I felt more like I was documenting the code, rather than really taking a holistic approach to explaining the program. Of course, that isn’t exactly a good explanation, but it definitely felt different to the other approach. Maybe I felt dirty because I wasn’t following the Knuth religion to the letter. I think this approach probably has more legs, but I did end up with a lot of syntactic garbage in the source file, which made it more difficult to read than it should have been. Also I couldn’t find a good way of summarising large chunks of code. So for example, I could present code for a loop with the body replaced by a reference to another block of code, which is one of the nice things I could do in noweb. Of course that is probably something that can be added to the tool in the future, and isn’t really the end of the world.
Where to go next? Well, I think I’m going to try and go back and reproduce my original kernel documentation using Idiopidae to see what the experience is when only modifying one variable (the tool), and see how that goes. If that can produce something looking reasonably good, I think I might invest some time in extending Idiopidae to get it working exactly how I want it to.
While I end up using Mac OS X as my primary GUI, I still do a lot of development work on Linux. I'm using VMware Fusion to host a virtual headless Linux machine, which is all good. Recently I decided to upgrade my OS to Ubuntu 8.04, which promotes have a just-enough OS (jeOS), which seemed perfect for what I wanted to do. Unfortunately the process of getting the VMware client tools installed was less than simple. Cut a long story short, the fix is described by Peter Coooper, and things work well after that. (It is a little annoying that the Ubuntu documentation doesn't explain this, or link to this.).
Anyway, after this I'm able to share my home directory directly between OS X, and my virtual machine, which is absolutely fantastic, as I'm not using TRAMP or some network filesystem to shuffle files back and forth between the virtual machine and the main machine.
Unfortunately, I ran into a bit of a problem, specifically, history
was not working in zsh. Specifically
saving the history into the history file was not working, which is a
really painful situation. It was not really clear why that was, running
fc -W manually didn't work either, but managed to fail
silently, no stderr output, and no error code returned.
Failing this I went back to the massively useful debugging tool
strace.
This finally gave me the clue that link() (hard linking) was
failing. I confirmed that using ln.
So, it turns out that the VMware hgfs file system doesn't support
hard linking, which is a real pain, especially since the underlying OS
X file system supports hard linking. So I'm down to the work around of storing
my history file in /tmp rather than my home directory, which
is slightly annoying, but not the end of the world.
As it turns out I'm not the first to discover this, Roger C. Clermont also found this out a few days ago. With any luck we will find a solution in the near future.
Mac OS X has the kevent()
system call which allows you to monitor various kernel events. This is kind of
useful, because I want to, well, watch a file, and then do something when it
changes. Now, I would have thought I could find something really simple to
do this, but I could only find massive GUI programs programs to do this, which
is not so great for scripting.
Anyway, long story short, I decided to write my own. It was pretty straight
forward. I thought it was worth documenting how it works so that Benno 5 years
from now can remember how to use kevent.
The first important thing you need to create a kernel queue using
the kqueue()
system call. This system call returns a descriptor which allows you
get to use on calls to kevent(). These descriptors come
out of the file descriptor namespace, but don't actually get inherited
on fork().
19 | int kq;
|
77 78 79 80 | kq = kqueue();
if (kq == -1) {
err(1, "kq!");
}
|
After creating the kernel queue, an event is register. The
EV_SET macro is used to initialise the struct
kevent. The 1st argument is the address of the event structure
to initialise it. The 2nd argument is the file descriptor we wish to
monitor. The 3rd argument is the type event we wish to monitor. In
this case we want to monitor the file underlying our file descriptor,
which is this EVFILT_VNODE event. The 5th argument is
some filter specific flags, in this case NOTE_WRITE,
which means we want to get an event when the file is modified. The
4th argument describes what action to perform when the event
happens. In particular we want the event added to the queue, so we use
EV_ADD & EV_CLEAR. The EV_ADD is obvious,
but EV_CLEAR less so. The NOTE_WRITE event
is triggered by the first write to the file after register, and
remains set. This means that you continue to receive the event
indefinitately. By using the EV_CLEAR flag, the state is
reset, so that an event is only delivered once for each
write. (Actually it could be less than once
per write, since events are coalesced.) The final arguments
are data values, which aren't used for our event.
The kevent system call actually registers the
event we initialised with EV_SET. The kevent
function takes a the kqueue descriptor as the 1st argument. The 2nd and
3rd arguments are a list of events to register (pointer and length). In this
case we register the event we just initialised. The 4th and 5th arguments
is a list of events to receive (in this case empty). The final argument
is a timeout, which is not relevent in this case (as we aren't
receiving any events).
59 | struct kevent ke;
|
83 84 85 86 87 88 89 90 91 92 93 94 | EV_SET(&ke,
/* the file we are monitoring */ fd,
/* we monitor vnode changes */ EVFILT_VNODE,
/* when the file is written add an event, and then clear the
condition so it doesn't re- fire */ EV_ADD | EV_CLEAR,
/* just care about writes to the file */ NOTE_WRITE,
/* don't care about value */ 0, NULL);
r = kevent(kq, /* register list */ &ke, 1, /* event list */ NULL, 0, /* timeout */ NULL);
if (r == -1) {
err(1, "kevent failed");
}
|
After we have registered our event we go into an infinite loop
receiving events. In this time we aren't setting up any events,
so it is the list to register is simply NULL. But,
4th and 5th argument have a list of up to 1 item to receive.
In this case we still don't want a timeout. We want to check
that the event we received was what expected, so we assert
it is true.
33 34 35 36 37 38 39 40 | r = kevent(kq,
/* register list */ NULL, 0,
/* event list */ &ke, 1,
/* timeout */ NULL);
if (r == -1) {
err(1, "kevent");
}
assert(ke.filter == EVFILT_VNODE && ke.fflags & NOTE_WRITE);
|
The aim of this program is to run a shell command whenever a file
changes. Simply getting the write is not good enough. A
progam that is updating a file will cause a number of consecutive
writes, and since it is likely that our shell command is
going to want to operate on a file that is a consistent state, we want
to try and ensure the file is at a quiescent point. UNIX doesn't
really provide a good way of doing this. Well, actually, there is a
bunch of file locking APIs, but I guess I haven't really used them
much, and it isn't clear if the file writing would be using them, and
as far as I can tell, the writing file would have had to be written
using the same locking mechanism. Also, the commands I want to run are
only going to be reading the file, not writing to it, so at worst I'm
going to end up with some broken output until the next write. Anyway,
to get something that will work almost all the time, I've implemented
a simply debouncing technique. It is a simple loop that waits until
the file is not written to for 0.5 seconds. 0.5 seconds is a good tradeoff
between latency and ensuring the file is quiescent. Of course it is
far from ideal, but it will do.
To implement this a struct timespec object is created
to pass as the timeout parameter to kevent.
21 | struct timespec debounce_timeout;
|
72 73 74 | /* Set debounce timeout to 0.5 seconds */
debounce_timeout.tv_sec = 0;
debounce_timeout.tv_nsec = 500000000;
|
In the debounce loop, kevent is used, but this time
passed with the 0.5 second timeout.
41 42 43 44 45 46 47 48 49 50 | /* debounce */
do {
r = kevent(kq,
/* register list */ NULL, 0,
/* event list */ &ke, 1,
/* timeout */ &debounce_timeout);
if (r == -1) {
err(1, "kevent");
}
} while (r != 0);
|
Finally after the debounce, we run the command that the user specified on the command line. The following code shows the declaration, initialisation and execution of the command.
20 | char *command;
|
70 | command = argv[2];
|
51 | system(command);
|
To use simplefilemon is easy. E.g: simplefilemon filename "command to run".
You can compile simplemon.c with gcc simplefilemon.c -o simplefilemon.
Download: simplefilemon.c
I'm in San Jose at the moment for both the CELF Embedded Linux Conference and the Embedded Systems Conference (ESC). (Which are conveniently scheduled at the same time, in different places!). I'm not quite sure how much of each I'll see. I'm primarily going to at CELF, but will probably end up playing some time as booth babe at the Open Kernel Labs stand at ESC.
Most importantly there will be beer at Gordon Biersch (San Jose) on Tuesday night from around 7pm. (Not Thursday night as I may have told people previously, of course if anyone wants to meet up on Thursday as well, that works too.)
I did manage to take a quick break from work yesterday and took advantage of the awesome weather in northern California to drive down to Big Sur along Highway 1. It was some pretty spectacular scenery. Hopefully I won't have a sprained ankle next time and will be able to do some hiking.
ESC seems to bring out some fun billboards, such as this one that I saw driving outside my hotel room today.
I released a new version of pexif today. This release fixes some small bugs and now deals with files containing multiple application markers. This means files that have XMP metadata now work.
Now I just wish I had time to actually use it for its original purpose of attaching geo data to my photos.
So, I started with something reasonably straight-forward — update my blog posts so that the <title> tag is set correctly — which quickly led me down the rabbit hole of typographically correct apostrophes, Unicode, XML, encodings, keyboards and input methods. Updating my blog software took about 15 minutes, delving down the rabbit hole took about 5 hours.
So, the apostrophe. This isn’t about the
correct usage of the apostrophe. This is entirely about correctly
typesetting of the
apostrophe. Now there are lots
of
opinions on the
subject. It basically comes down to the choice between ASCII character 0x27 and
Unicode code point U+2019. Of course it just so happens
that ASCII character 0x27 is also Unicode code point U+0027, so really, this
comes down to a discussion about which Unicode code point is most appropriate
for representing the apostrophe. After way too much searching, it actually
turns out to be a really simple decision. Unicode provides the documentation
for the code points in a series of charts. The chart
C0 Controls and Basic Latin (pdf)
documents the APOSTROPHE
. It is described as:
0027 ' APOSTROPHE = apostrophe-quote (1.0) = APL quote • neutral (vertical) glyph with mixed usage • 2019 ’ is preferred for apostrophe
So, despite the fact that it is named APOSTROPHE
, it is described
as a neutral (vertical) glyph with mixed usage
, and it notes that
U+2019 is the preferred code point for apostrophe. This looks pretty conclusive
but let’s check the General
Punctuation chart:
2019 ’ RIGHT SINGLE QUOTATION MARK = single comma quotation mark • this is the preferred character to use for apostrophe
So, my conclusion is that the most appropriate character for an apostrophe is U+2019. OK, great, now I have to decide how I can actually encode this. I’m used to writing plain ASCII text documents, and U+2019 is not something I can represent in ASCII. So, since I’m mostly concerned about document I’m publishing on the interwebs, and I figured that character entities refernences would be the way to go. So there appears to be a relevant entity:
<!ENTITY rsquo CDATA "’" -- right single quotation mark, U+2019 ISOnum -->
Of course is seems a little odd using &rsquot; to represent an
apostrophe, but so be it. Now in XML a new character entity is defined ',
which you might on first glance think is exactly what you want, but on second
glance, it isn’t, since it maps to U+0027, not U+2019. ' is mostly
used for escaping strings which are enclosed in actual ' characters.
So, ' is out. XML itself only defines character entities for
ampersand, less-than, greater-than, quotation mark, and apostrophe. XHTML
however defines the rest of the character entities that you have come to
love and expect from HTML, so &rsquot; is still in, as long as it is
used in an XHTML document, not a general XML document.
So I was set on just using &rsquot;, and I sent my page off
to the validator. This went
fine, except it pedantically pointed out that I had not defined
a character encoding, and really I should. Damn, now I need to think
about character encoding too. OK, so what options are there? Well,
IANA, has a nice list
of official names for characters sets that may be used in the internet
.
ANSI_X3.4-1968 (a.k.a US-ASCII, a.k.a ASCII) had to be a big first contender, since that is basically what I had been doing for many a year, but to be honest, this seemed a little backwards. The idea of having to use numeric character references (NCRs) everytime I wanted an apostrophe seemed a little silly. Besides the W3C recommends using
an encoding that allows you to represent the characters in their normal form, rather than using character entities or NCRs
OK, so since XML spec defines that:
A character reference refers to a specific character in the ISO/IEC 10646 character set
it seems that I really should choose an encoding that can directly
encode Unicode code points. (The Unicode standard and ISO/IEC 10646
track each other.) So, what options are there for encoding Unicode?
Well it seem that one of the Unicode transformation formats
would be a good choice. But there are so many to choose from,
UTF-8,
UTF-16,
UTF-32, even UTF-9.
While UTF-9 was definitely a contender, UTF-8 seems to sanest
thing for me. For a start it seems to just-work
™ in my
editor. So, going
with UTF-8, I still end up needing to let other people know my files
are encoded in UTF-8. There appears to be a
few options for doing this, and the article goes into a long
explanation of the various different pros and cons. In the end, I just put it into
the XML prolog.
Of course the final piece of the puzzle is actually inputing
characters. OS X seems to have fairly good support for this. If you
poke around a bit in internationalisation support in system
preferences and enable Show Keyobard Viewer
, Show Character
Viewer
and Unicode Hex Input
, you should be able to works
things out.
So, I can now have lovely typographically correct apostrophes and they work great, and all is good with the world. (Except of course that this page probably renders like crap in Internet Explorer. Oh well.)
The backup files that emacs litters your filesystem with
can be a real pain. Stupid tilde files can be annoying and dangerous.
Especially since ~ does double duty of
being a short cut for your home directory. (I can't be the only
person who has accidently typed rm -fr *~ as
rm -fr * ~).
Anyway, the easy solution is to add this to your config file:
(setq backup-directory-alist '(("" . "~/.emacs.d/emacs-backup")))
In a recent
post Gernot made a comparison between nanokernels and hardware
abstraction layers (HALs). This prompted a question on the
OKL4 developers mailing list: well, couldn’t you consider a microkernel a HAL?
.
I think the logical conclusion, both theoretical and practical, is a resounding, no.
Why? Well, a microkernel is, in theory (if not always in practise) minimal. That is, it should only include things in the kernel are those pieces of code that must run in privileged mode.
So, if a microkernel was to provide any hardware abstractions, it will only be providing the abstraction that have to be in the kernel. Which really falls short of a complete hardware abstraction layer.
Now, probably the more interesting question is should the
microkernel provide any hardware abstraction
, and if so what
hardware should it be abstracting
, and what is the right
abstraction
. After starting to write some answers to these
questions I reminded myself of the complexity involved in answering
them, so I will leave these questions hanging for another post.