For a little personal project I’ve been working on recenly, I needed to create some unit tests for a CGI script, sorry, that should be, a cloud computing application. Of course, I turned to my trusty hammer, Python, and was a little disappointed to discover it didn’t quite have the batteries I needed included.
I kind of thought that urllib2 would more or less do what I wanted out of the box, but unfortunately it didn’t and (shock, horror!) I actually needed to write some code myself! The first problem I ran into is that urllib2 only supports GET and POST out of the box. HTTP is constrained enough in the verbs it provides, so I really do want access to things like PUT, DELETE and HEAD.
The other problem I ran into, is that I did’t want things automatically redirecting (although clearly this would be the normal use-case), because I wanted to check I got a redirect in certain cases.
The final problem that I had is that only status code 200 is treated as success, and other 2xx codes raise exceptions. This is generally not what you want, since 201, is a perfectly valid return code, indicating that new resource was created.
So, urllib2 is billed as An extensible library for opening URLs
using a variety of protocols
, surely I can just extend it to do
what I want? Well, it turns out that I can, but it seemed to be harder
than I was expecting. Not because the final code I needed to write was
difficult or involved, but because it was quite difficult to work out
what the right code to write is. I want to explore for a little bit
why (I think) this might be the case.
urllib2 is quite nice, because simple things (fetch a URL, follow redirects, POST data), are very easy to do:
ret = urllib2.urlopen("http://some_url/")
data = ret.read()
And, it is definitely possible to do more complex things, but (at least for me), there is a sharp discontinuity in the API which means that learning the easy API doesn’t help you learn the more complex API, and also the documentation (at least as far as I read it), doesn’t make it apparent that there are kind of two modes of operation.
The completely frustrating thing is that the documentation in the source file is much better than the online documentation! Since it talks about some of the things that happen in the module, which are otherwise “magic”.
For example, the build_opener
function is pretty
magical, since it does a bunch of introspection, and ends up either
adding a handler or replacing a handler depending on
the class. This is explained in the code as: if one of the
argument is a subclass of the default handler, the argument will be
installed instead of the default.
, which to me makes a lot of
sense, where-as the online documentation describes it as:
Instances of the following classes will be in front of the handlers,
unless the handlers contain them, instances of them or subclasses of
them: ....<list of default handlers>
. For me the former is
much clearer than the latter!
Anyway, here is the code I ended up with:
opener = urllib2.OpenerDirector() opener.add_handler(urllib2.HTTPHandler()) def restopen(url, method=None, headers=None, data=None): if headers is None: headers = {} if method is None: if data is None: method = "GET" else: method = "POST" return opener.open(HttpRestRequest(url, method=method, headers=headers, data=data))
So, conclusions, if the dos don’t make sense, read the code. If you are writing an API, try to make it easier to get from the easy case to the more complex case. (This is quite difficult to do, and I have definitely been guilty in the past of falling into the same trap in APIs I have provided!). If you can’t get the API to easily transition from easy to hard, make sure you document it well. Finally, Python is great language for accessing services over HTTP, even if it does require a little bit of work to get the framework in place.
I’ve been setting up my new Android phone and found out that the password handling code in the IMAP client doesn’t work so well with my Dovecot IMAP server.
Now, I really can’t be bothered working out which side is doing it wrong, but Dovecot expects your password to be contained in double-quotes if it contains special character. (No, I don’t precisely know what those characters are!). And, of course, if you are double-quoing the string, you then need to escape and double-quotes in the password itself. Now, like I said, no idea of this is the correct (according to the standard) behaviour, but it is the behaviour that I have to deal with, since I can’t mess with the server. Of course, the Android IMAP client doesn’t do this escaping, and so you get an error message indicating your username / password is incorrect. Frustratingly, the error message doesn’t pass on the information from the server giving detail on exactly what the problem is so you are left guessing.
Anyway, it turns out if you manually escape the password that you give to the Android email client things work fine. Of course, the SMTP server doesn’t need this escaping, and fails if you do have it, so you need to type in a different, unescaped, password for SMTP. Fun and games!
Looking at the lastest source code for Android, it looks like this has been properly fixed, so hopefully and upgrade to the Cupcake branch in the near future will solve the problem.
This post is basically a crash course in using gdb to run and debug
low-level code. Learning to us a debugger effectively can
be much more powerful than ghetto printf()
and putc()
debugging. I should point out that I am far from a power-gdb user, and
am usually much more comfortable with printf()
and putc()
,
so this is very much a beginners guide, written by a newbie. With those
caveats in mind, lets get started.
So the first thing to do is to get our target up and running. For
this our target will be a virtual device running with Skyeye. When you start up
Skyeye and pass it the -d
flag, e.g: $ skeye -c
config.cfg -d
. This will halt the virtual processor and provide
an opportunity to attach the debugger. The debugger will be available
on a UNIX socket. It defaults to port 12345
. Of course
a decent JTAG adapter should be able to give you the same type of
thing with real hardware.
Now, you run GDB: $ arm-elf-gdb
. Once gdb is running
you need to attach to the target. To do this we use:
(gdb) target remote :12345
. Now you can start the
code running with (gdb) continue
.
Now, just running the code isn’t very useful, you can do that already.
If you are debugging you probably want to step through the code.
You do this with the step
command. You can step through
code line at-a-time, or instruction at-a-time. At the earliest stages
you probably want to use the si
command to step through
instruction at-a-time.
To see what you code is doing you probably want to be able to display
information. For low-level start up code, being able to inspect the register
and memory state is import. You can look at the register using the
info registers
command, which prints out all the general-purpose
registers as well as the program counter and status registers.
For examing memory the x
command is invaluable. The
examine command takes a memory address as an argument (actually, it
can be a general expression that quates to a memory address). The
command has some optional arguments. You can choose the number of
units to display, the format to display memory in (hex (x), decimal
(d), binary (t), character (c), instruction (i), string(s), etc), and
also the unit size (byte (b), halfword (h), word (w)). So, for
example to display the first five words in memory as hex we can do:
(gdb) x /5x 0x0
. If we want to see the values of
individual bytes as decimal we could do: (gdb) x /20bd 0x0
.
Another common example is to display the next 5 instructions, which can
be done with (gdb) x /5i $pc
. The $pc
expression
returns the value in the pc
register.
Poking at bits and bytes and stepping instruction at a time is
great for low-level code, but gdb can end up being a lot more useful
if it knows a little bit more about the source code you are
debugging. If you have compiled the source code with the
-g
option, your ELF file should have the debugging
information you need embedded in it. You can let gdb know about this
file by using the (gdb) symbol program.elf
. Now
that you actually have symbols and debugging information, you can
do things like normal then step
command, and it will
step through lines of source code (rather than instructions).
The other nice thing you have is that you can easily set
breakpoint and watchpoints. (You don’t have to have
source debugging enabled for this, but it makes things a lot easier!).
Seting a breakpoint is easy, you can set it on a line e.g: (gdb) break file.c:37
,
or on a particular function e.g: (gdb) break schedule
.
Breakpoints are neat, but watchpoints are even cooler, since you can
test for a specific conditions e.g: (gdb) watch mask < 2000
.
Now that you have these nice watchpoints and breakpoints, you
probably find that most of the time, you just end up printing out some
variables each time you hit the point. To avoid this repetitive typing
you can use the display
command. Each expression you
install with the display command will be printed each time program execution
stops (e.g: you hit a break-point or watch-point). This avoids a lot
of tedious typing!
So, this is of course just scratching the surface. One final thing
to consider that will likely make your time using gdb more useful and
less painful (i.e: less repetitive typing), is the define
command which lets you create simple little command scripts. The other
is that when you start gdb you can pass a command script with the
-x
. So, you might want to consider, instead of littering
your code with printf() statements everywhere you might want to write
some gdb commands that enable a breakpoint and display some the relevant
data.
Good luck, and happy debugging!
Yesterday we saw how with some careful programming, and the right compiler flags we could get GCC to generate optimally small code for our particular function. In this post we take a look at one of the ways in which GCC fails to produce optimal code. Now there are actually many ways, but the I want to concentrate on in this post is the use of literal pools.
So, what is a literal pool? I’m glad you asked. The
literal pool is an area of memory (in the text segment), which is used
to store constants. These constants could be plain numerical
constants, but their main use is to store the address of variables
in the system. These addresses are needed because the ARM instruction
does not have any instructions for directly loading (or storing) an
address in memory. Instead ldr
and str
can only
store at a ±12-bit offset from a register. Now there are lots of ways
you could generate code with this restriction, for example, you could ensure
your data section is less than 8KiB in size, and reserve a register
to be used as a base for all data lookups. But such approach only works
if you have a limited data section size. The standard approach that is
taken is that when a variable is used its address is written out
into a literal pool. The compiler then generates two instructions, first
to read the address from this literal pool, and the second is the
instruction to access the variable.
So, how exactly does this literal pool work? Well, so that a
special register is not needed to point at the literal pool, the
compiler uses the program counter (PC
) register as the
base register. The generated codes looks something like: ldr r3,
[pc, #28]
. That codes loads a value at a 28-byte offset from
the current value of PC
into the register
r3
. r3
then contains the address of the
variable we want to access, and can be used like: ldr r1, [r3,
#0]
, which loads the value of the variable (rather than the
address) into r1
. Now, as the PC
is used as
the base for the literal pool access, it should be clear that the
literal pool is stored close enough to the code that needs to use it.
To ensure that the literal pool is close enough to the code using it, the compiler stores a literal pool at the end of each function. This approach works pretty well (unless you have a 4KiB+ function, which would be silly anyway), but can be a bit of a waste.
To illustrate the problem, consider this (contrived) example code:
static unsigned int value; unsigned int get_value(void) { return value; } void set_value(unsigned int x) { value = x; }
Now, while this example is contrived, the pattern involved exhibits
itself in a lot of real-world code. You have some private data in a
compilation unit (value
), and then you have a set of
accessor (get_value
) and mutator (set_value
)
functions that operate on the private data. Usually the functions would
be more complex than in our example, and usually there would be more
than two. So lets have a look at the generated code:
00000000 <get_value>: get_value(): 0: 4b01 ldr r3, [pc, #4] (8 <get_value+0x8>) 2: 6818 ldr r0, [r3, #0] 4: 4770 bx lr 6: 46c0 nop (mov r8, r8) 8: 00000000 .word 0x00000000 8: R_ARM_ABS32 .bss 0000000c <set_value>: set_value(): c: 4b01 ldr r3, [pc, #4] (14 <set_value+0x8>) e: 6018 str r0, [r3, #0] 10: 4770 bx lr 12: 46c0 nop (mov r8, r8) 14: 00000000 .word 0x00000000 14: R_ARM_ABS32 .bss
You can see that each function has a literal pool (at address 0x8
and 0x14). You can also see that there is a relocation associated with
each of these addresses (R_ARM_ABS32 .bss
). This
relocation means that at link time the address of value
will be stored at locations 0x8 and 0x14. So, what is the big deal
here? Well, there are two problems. First, we have two literal pools
containing duplicate data, by storing the address of
value
twice, we are wasting 4 bytes (remember from
yesterday, we have a very tight memory budget and we care where every
byte goes). The second problem, is that we need to insert a nop
in the code (at address 0x6 and 0x12), because the literal pool must
be aligned.
So, how could the compiler be smarter? Well, if instead of generating a literal pool for each individual function it did it for the whole compilation unit, then instead of having lots of little literal pools with duplicated data through-out, we would have a single literal pool for the whole file. As a bonus, you would only need alignment once as well! Obviously if the compilation unit ends up being larger than 4KiB then you have a problem, but in this case you could still save up producing the literal pool until after 4KiB worth of code. As it turns out the commercial compiler from ARM, RVCT, does exactly this. So lets have a look at the code it generates:
00000000 <get_value>: get_value(): 0: 4802 ldr r0, [pc, #8] (c <set_value+0x6>) 2: 6800 ldr r0, [r0, #0] 4: 4770 bx lr 00000006 <set_value>: set_value(): 6: 4901 ldr r1, [pc, #4] (c <set_value+0x6>) 8: 6008 str r0, [r1, #0] a: 4770 bx lr c: 00000000 .word 0x00000000 c: R_ARM_ABS32 .data$0
You see that the code is more or less the same, but there is just
one literal pool right at the end of the file, and no extra
nop
s are needed for alignment. Without merging literal
pools we have a .text size of 24 bytes, with the merging we slash
that down to 16 bytes.
So merging literal pools is pretty good, but the frustrating thing is that in this example, we don’t even need the literal pool!. If we examine the final compiled image for this program:
Disassembly of section ER_RO: 00008000 <get_value>: get_value(): 8000: 4802 ldr r0, [pc, #8] (800c <set_value+0x6>) 8002: 6800 ldr r0, [r0, #0] 8004: 4770 bx lr 00008006 <set_value>: set_value(): 8006: 4901 ldr r1, [pc, #4] (800c <set_value+0x6>) 8008: 6008 str r0, [r1, #0] 800a: 4770 bx lr 800c: 00008010 .word 0x00008010 Disassembly of section ER_RW: 00008010 <value>: 8010: 00000000 .word 0x00000000
You should notice that the actual location of the value
variable is
0x8010
. At address 0x800c
we have the literal pool storing
the address of a variable which is in the very next word! If we optimised this
by hand, we would end up with something like (need to verify the offset):
Disassembly of section ER_RO: 00008000 <get_value>: get_value(): 8000: 4802 ldr r0, [pc, #4] (8008 <set_value+0x8>) 8002: 4770 bx lr 00008004 <set_value>: set_value(): 8004: 6008 str r0, [pc, #0] 8006: 4770 bx lr Disassembly of section ER_RW: 00008008 <value>: 8008: 00000000 .word 0x00000000
If we get rid of the literal pool entirely, we save the memory of the literal pool itself (4 bytes), plus the two instructions need to load values out of the literal pool (4 bytes). This cuts our text size down to a total of only 8 bytes! This is a factor 3 improvement over the GCC generated code. Granted, you are not always going to be able to perform this type of optimisation, but when you care about size, it is quite important. It would be nice if gcc supported a small data section concept so that you could specify variables that essentially resised within the literal pool instead of needing an expensive (in terms of space and time) indirection.
For this project, it looks like the code will have to be hand tweaked assembler, which is frustrating, because when you use a person as your compiler iterations become really expensive, and you want to make sure you get your design right up front.
A project that I’m working on at the moment calls for a very small footprint. This post is about how to make code really small for the ARM architecture.
As you can probably guess, I’m interested in operating system code, so as an example, I’ve taken a very simple piece of operating system code, and stripped it right back to demonstrate some of the techniques to use when optimising for space. So here is the snippet of code we are going to optimise:
01 struct thread { 02 unsigned int notify_bits; 03 }; 04 05 unsigned int 06 poll(struct thread *thread, unsigned int mask) 07 { 08 unsigned int result; 09 result = thread->notify_bits & mask; 10 if (result) { 11 thread->notify_bits &= ~result; 12 } 13 return result; 14 }
In this very simple operating system we have threads (thread data
is stored in struct thread
). Each thread has a set of 32
signals (encoded in a single word notify_bits
). This poll
is used by a thread to determine if it has been sent any signals. The
mask
parameter is the set of signals that the thread is
interested in checking. So, a thread can check if a single signal has
been thread, or if any signal has been set, or if a specific subset of
signals has been set. The function returns the signals that are
available (which is simply the bit-wise and of
notify_bits
and mask
). It is important that
the function clears any signals that have been returned. This makes
sure that if poll
is called twice the same signals are
not returned. This is achieved in lines 10—12.
So, our goal is to try and get this code as small as possible.
And every byte counts! First off, we just try and compile
this with the standard ARM gcc compiler. I’m using version 4.2.2.
So we start with: $ arm-elf-gcc -c poll.c -o poll.o
.
We can then use object dump to work out what the compiler did:
$ arm-elf-objdump -dl poll.o
.
00000000 <poll&ht;: poll(): 0: e1a0c00d mov ip, sp 4: e92dd800 push {fp, ip, lr, pc} 8: e24cb004 sub fp, ip, #4 ; 0x4 c: e24dd00c sub sp, sp, #12 ; 0xc 10: e50b0014 str r0, [fp, #-20] 14: e50b1018 str r1, [fp, #-24] 18: e51b3014 ldr r3, [fp, #-20] 1c: e5932000 ldr r2, [r3] 20: e51b3018 ldr r3, [fp, #-24] 24: e0023003 and r3, r2, r3 28: e50b3010 str r3, [fp, #-16] 2c: e51b3010 ldr r3, [fp, #-16] 30: e3530000 cmp r3, #0 ; 0x0 34: 0a000006 beq 54 <poll+0x54> 38: e51b3014 ldr r3, [fp, #-20] 3c: e5932000 ldr r2, [r3] 40: e51b3010 ldr r3, [fp, #-16] 44: e1e03003 mvn r3, r3 48: e0022003 and r2, r2, r3 4c: e51b3014 ldr r3, [fp, #-20] 50: e5832000 str r2, [r3] 54: e51b3010 ldr r3, [fp, #-16] 58: e1a00003 mov r0, r3 5c: e24bd00c sub sp, fp, #12 ; 0xc 60: e89da800 ldm sp, {fp, sp, pc}
So, our first go gets us 100 bytes of code. Which is 10 bytes
(or 2.5 instructions) for each of our lines of code. We should be able
to do better. Well, the first thing is we should try is to use
optimisation: $ arm-elf-gcc -c poll.c -o poll.o -O2
. This
gives us a much better code generation output:
00000000 <poll>: poll(): 0: e5902000 ldr r2, [r0] 4: e1a0c000 mov ip, r0 8: e0110002 ands r0, r1, r2 c: 11e03000 mvnne r3, r0 10: 10033002 andne r3, r3, r2 14: 158c3000 strne r3, [ip] 18: e12fff1e bx lr
So this got us down to 28 bytes. A factor 4 improvement for one
compiler flag, not bad. Now, -O2
does some standard
omptimisations, but -Os
, will do optimisations
specifically for reducing the amount of code. So trying:
$ arm-elf-gcc -c poll.c -o poll.o -Os
. This gives
a little bit better code-gen:
00000000 <poll>: poll(): 0: e5903000 ldr r3, [r0] 4: e1a02000 mov r2, r0 8: e0110003 ands r0, r1, r3 c: 11c33000 bicne r3, r3, r0 10: 15823000 strne r3, [r2] 14: e12fff1e bx lr
Down to 24 bytes (6 instructions), is pretty good. Now, as you can
see the generated code has 32-bits per instruction. The some of the
ARM architectures have two distinct instruction sets,
ARM and Thumb. The Thumb instruction
set uses 16-bit per instruction, instead of 32-bit. This denser
instruction set can enable much smaller code sizes. Of course there is
a trade-off here. The functionality of the 16-bit instructions is
going to be less than the 32-bit instructions. But lets give it a
try. At the same time, we will tell the compiler the exact CPU we want
to compile for (which is the ARM7TDMI-S) in our case. The compiler
line is: $ arm-elf-gcc -c poll.c -o poll.o -Os -mcpu=arm7tdmi
-mthumb
.
Which produces code like:
00000000 <poll>: poll(): 0: 6803 ldr r3, [r0, #0] 2: 1c02 adds r2, r0, #0 4: 1c08 adds r0, r1, #0 6: 4018 ands r0, r3 8: d001 beq.n e <poll+0xe> a: 4383 bics r3, r0 c: 6013 str r3, [r2, #0] e: 4770 bx lr
So, now we are down to 16 bytes, so in Thumb we need 8 instructions (2 more than ARM), but each is only 2 bytes, not 4, so we end up with a 1/3 improvement. To get any further, we need to start looking at our code again, and see if there are ways of improving the code. Looking at the code again:
00 unsigned int 01 poll(struct thread *thread, unsigned int mask) 02 { 03 unsigned int result; 04 result = thread->notify_bits & mask; 05 if (result) { 06 thread->notify_bits &= ~result; 07 } 08 return result; 09 }
You may notice that the branch instruction on line 5, you may
notice that this is actually redundant. If result
is
zero, then ~result
well be 0xffffffff
. Given
this thread->notify_bits &= 0xffffffff
will not change
the value of thread->notify_bits
. So, we can reduce this
to:
00 unsigned int 01 poll(struct thread *thread, unsigned int mask) 02 { 03 unsigned int result; 04 result = thread->notify_bits & mask; 05 thread->notify_bits &= ~result; 06 return result; 07 }
When we compile this we get down to:
00000000 <poll>: poll(): 0: 6803 ldr r3, [r0, #0] 2: 4019 ands r1, r3 4: 438b bics r3, r1 6: 6003 str r3, [r0, #0] 8: 1c08 adds r0, r1, #0 a: 4770 bx lr
This gets us down to 6 instructions (12 bytes). Pretty good since
we started at 100 bytes. Now lets look at the object code in a little
bit more detail. If you look at address 0x8, the instruction simply
moves register r1
into register r0
so that
it in the right place for return. (Note: The ARM ABI has the
return value stored in r0
). This seems like a bit of a waste, it
would be good if there was a way we could have the value stored
in r0
and not waste an instruction just moving values
between registers. Now, to get a better understanding of what the
instructions are doing, I’m going to slightly rewrite the code, and
then compile with debugging, so we can see how the generated code
matches up with the source code. So first, lets rewrite the code a little
bit:
00 unsigned int 01 poll(struct thread *thread, unsigned int mask) 02 { 03 unsigned int tmp_notify_bits = thread->notify_bits; 04 mask &= tmp_notify_bits; 05 tmp_notify_bits &= ~mask; 06 thread->notify_bits = tmp_notify_bits; 07 return mask; 08 }
You should convince yourself that this code is equivalent to the
existing code. (The code generated is identical). So, we can now line
up the source code with the generated code. On line 03 (unsigned
int tmp_notify_bits = thread->notify_bits
), this matches up
with address 0x0 (ldr r3, [r0, #0]
). So, register
r3
is used to store variable
tmp_notify_bits
. The parameter thread is stored in
register r0
. Now, line 04 (mask &=
tmp_notify_bits
) matches directly with address 0x2 (ands
r1, r3
). Register r1
matches directly with the
mask
parameter. The important part of restructuring the
code as we have done, is that it becomes obvious that we can directly
using the mask parameter instead of needing an extra variable like in
the previous code. As we continue line 05 (tmp_notify_bits &=
~mask), matches directly to 0x4 (bics r3, r1). The bics
instruction is quite neat in that it can essentially do the &=
~
in one 16-bit instruction. Line 06
(thread->notify_bits = tmp_notify_bits
) stores the
result back to memory, matches directly to 0x6. Now the final line of
code (return mask;
), needs two instructions 0x8 and 0xa
(adds r0, r1, #0 and bx lr). Now the reason we need to instructions
is because mask is stored in register r1
, and the return
value needs to be in register r0
. So how can we get mask
to be stored in r0
all along? Well, if we switch the
parameters around poll(unsigned int mask, struct thread
*thread)
, the mask
will instead be stored in
r0
instead of r1
. (Note: We don’t
necessarily have to change the interface of the function. If we want
to support keep the interface we can use a simple macro to textually
swap the parameters.) If we compile this, we get the following
generated code:
00000000 <poll>: poll(): 0: 680b ldr r3, [r1, #0] 2: 4018 ands r0, r3 4: 4383 bics r3, r0 6: 600b str r3, [r1, #0] 8: 4770 bx lr
So, we have got this down to 5 instructions, 10 bytes. This is a factor 10 improvement, not bad!
So a bit of a review:
-Os
to optimise for space.The UNIX® shell is one of the tools (along with Emacs) that I never
feel like I’m using to the fullest. This morning was another opportunity
to expand my shell repertoire. Most of my source code is stored in
mercurial repositories, and I was sick of doing % find . | grep -v \.hg
to strip out the meta-data. Surely there is a better way? Note: I’m
using zsh and GNU tools on Mac OS X, although I’m
pretty sure that most of the examples should work fine in Bourne-like on other
UNIX-like operating systems.
find
and -prune
So the thing that sucks about the find . | grep -v
<foo>
type pattern is that find
will do all
the work of descending into the foo
directory and
scanning it. It also means you can’t take advantage of other
find
niceties such as the -exec
flag. So, if
only there was a way to prune our directory tree search. Now,
the find
expression language is a mighty strange language
to code in, or at least it is to me. The language consists of a set of
primaries and a set of operators; an expression is
made up of a primaries linked together with operators. Now, there are
a few things that make this less than obvious, especially if you have
been na̮ïvely using. The first is that in almost all cases, the
find
utility automatically adds a -print
primary to your expression. The other big one (for me at least) is
that expression expression
is short-hand for
expression -and expression
. To me, grokking this made
a huge difference to my understanding of how find
actually
works.
The other part to grokking find is that some expression are what I
would call filters, that is the expression returns true if
the file matches some property, e.g: -name
,
-newer
, etc. Then there is a case of command
expressions that have some type of external side-effect. Examples are:
-print
, -print0
, -exec
,
-ok
. Next is the -prune
command,
which is in a special category all on its own because it modifies
the find
internal search process. Finally there are what I
would call global expression, such as -maxdepth
and -mindepth
. These are global because each applies
to the entire expression even if it would not normally be
evaluated.
This is pretty strange behaviour if you ask me, surely
they would have been better as options rather than expressions!
So with this in mind, lets build up an expression to do a find, but
ignoring my .hg
. So we start with an expression that will
just print out the .hg
directory. We need something like:
find . -type d -name .hg
. -type d
returns
true if the current path is a directory, and -name .hg
returns true if that last element of the path is
.hg
. Since our -and
is implicit, this
expression only returns true for directories named
.hg
. And, since we do not specify and
command
like expression, find automatically assumes we
want to print the path names, so the full expression is really
somethings like: find . \( -type d -and -name .hg \)
-print
.
So far, this does the exact opposite of what we ant,
it simply prints the .hg
directory. So instead lets use
-prune
, which tells find
to avoid descending
into the directory.
Note: find
usually
does a pre-order traversal (i.e: directory path comes before directory
content), if you use the -d
option, the behaviour is
changed to a post-order traversal (i.e: directory path comes after all
the directory content). It should be clear that the
-prune
expression is only going to work with pre-order
traversal. find
doesn’t have anyway to do a bread-first
traversal.
Since -prune
doesn’t count as one of the
-find
commands, find
will still auto-magically
mess with out expression to add -print
, so our expression becomes:
find . \( -type d -and -name .hg -and -prune \) -print
. Unsurprisingly
it continues to only output the .hg directory. To get the result we wan, we need to
do something in the other
case, when the path is not a directory named
.hg
. So, in the other case lets do a -print
, so we end
up with a command line: find . -type d -name .hg -prune -o -print
.
Since there is a command in the expression, find doesn’t auto-magically add a
-print
. This now does pretty much what we want. Of course
typing out find . -type d -name .hg -prune -o -print
is a lot
more work than find . | grep -v .hg
, so it would be nice
to make an alias for this command.
I’m going to have an alias called sfind
(short for
source find, not the best name, but short and easy for the muscle
memory). So first attempt would be to just alias the command we had
last, but I’d really like the alias to work mostly like
find
, so that I can add new arguments and do things like
sfind -name "*.c"
and have it just work™. Simply
aliasing the above command would end up with the final expression
being find . -type d -name .hg -prune -o -print -name
"*.c"
, which will do all the printing before it tests if the
final ends in .c
. Another alternative would be to alias
find . -type d -name .hg -prune -o
, but this forces me to
always have to type an expression; just sfind
by itself
won’t work. What I really need a fully formed expression, which still
works when adding new constraints. What would be ideal is something
like: find . -type d -name .hg -prune -o
-true
, so the other case always
evaluates to true. In the case where no arguments are added this will
end up being the equivalent of find . \( -type d -name .hg
-prune -o -true \) -print
, and in the case where further
expression are added it will end up being: find . \( -type d
-name .hg -prune -o -true <expression> \) -print
.
Unfortunately, there is no expression -true
. So, how can
we have an expression that always returns true? The best I’ve come up
with is -name "*"
. Other possibilities might be
depth +0
. This works pretty well, but you might notice
that now the .hg
directory is being printed! Why?
Because now instead of just printing files in the other case,
find is printing whenever the entire expression is true, and since the
last part of the left-hand-side is -prune
, and
-prune
always returns true, the pruned directory
evaluates to true. So how can we stop that? Well, something like
find . -type d -name .hg -prune -false -o -name
*
should do the trick. But of course, just as there is no
unconditional true, there isn’t an unconditional false either,
thankfully, it is trivial to construct one: -not -name *
.
So, what we end up with is: find . -type d -name .hg -prune -not
-name "*" -o -name "*"
. Easy!
Now the next thing I often do is find . | xargs grep
"foo"
or grep -r "foo" .
. (Which really should be
find -print0 | xargs -0
, but hey I’m lazy and my file
names don’t contain white-space). Since I was learning more about
find
I figured I should work out how to do this without
piping to xargs
, especially since my new found power of
-o
means I might want to run different commands on
different files, which wouldn’t work with a simple pipe to
stdout
. So the standard approach is using the cryptic
expression: -exec grep "foo" {} \;
. It is actually pretty
straight forward. The braces are replaced by the current pathname and
the command needs to be terminated in with a semi-colon (which needs
to be escaped in most interactive shells). Now this isn’t quite as
efficient as using xargs
since xargs will run the command
on multiple pathnames at once, whereas -exec
runs the
command once per pathname. Thankfully modern find seems to have an
alternative representation: -exec grep "foo" {}
+
. The plus-sign makes the behaviour of
-exec
similar to that of xargs
, processing
multiple pathnames per invocation.
So what I really want to do is grep
for a given
pattern, in a certain subset of my files. And I’d like to do this
without needing to type-out the verbose syntax each time. Something
like srcfind <regexp&> [find-expression]
, would be
ideal. Now a simple alias
isn’t going to work for this,
and I thought I was up for writing yet another tiny shell
script. Luckily, I was clued on to shell functions. I feel
very ignorant for not knowing about these before
hand, but hey, I’m a systems programmer, not a shell-scripting sysadmin
guru. Anyway, long story short, adding an appropriate shell function,
such as:
srcgrep() { if [ $# -gt 0 ]; then { RE=$1; shift; if [ $# != 0 ]; then O="("; C=")"; else O=""; C=""; fi; sfind -type f $O $@ $C -exec grep $RE {} \+ ; } else { echo "Usage: srcgrep[expression]"; } fi; }
to my .zshrc
does the trick. This shell function ended up being more
complex than I initially thought it would. Briefly it checks there is the
correct number of arguments, printing a usage if necessary. Next
it check if there are any find expressions. When there are it is
necessary to enclose them in brackets, or else the precedence rules
breaks expression that might have an -or
in them. Finally
it runs the previously defined sfind
alias. The -type f
avoid running grep
on directories.
Now grep
is nice and all, but it can sometimes be
hard to pick out the actual search result from the results. This is
where colour grep comes in. Grep has an option --color
, which
will use ANSI escape codes to markup the output. E.g:
bodyjar:/Users/benno/work/3.2/okl4-mainline% grep --color SORT coverage.sh declare SORT="" F) SORT="-F" tools/coverage.py -f fifo $SORT -s -S $SCOPE $IMAGE & tools/coverage.py -f fifo $SORT -s -S $SCOPE $IMAGE 2>&1 | tee $OUTFILE &
Now this is good as far as it goes, but when you start having more
complex regular expressions, you get to a point where you only want to
colourise part of the match, not the whole thing. For example:
^[A-Za-z_][A-Za-z0-9_]*\(
matches lines with function
declarations (at least when they follow the coding standard), but what
I would really like hi-lighted is only the function-name, not the
function-name plus opening bracket. This is where I ended up hitting
the limits of shell and standard UNIX tools, and ended up back with
good old familiar python. I created a simple grep replacement, colgrep, that by default
will colourise in the same manner as grep, but if you use
groups, it will only color the matched groups (and will give
each group a different colour). Groups are part of the Python regular
expression syntax (and probably Perl too). Basically if a portion of
the regular expression is enclosed in brackets, that part of the
regexp forms a group. E.g: In SO(R)T
the 'R' will be
matched as group #1. So for the previous example, if I write instead:
^([A-Za-z_][A-Za-z0-9_])*\(
, only the function name will
be hi-lighted. The other neat thing about python regular expressions
is that you can name groups with the syntax
(?P<name>expr)
. I take advantage of this in
colgrep to color groups with the name "err*" with a background of red
to hi-light errors.
So in conclusion, we’ve added some powerful commands to our toolbox which make it much more effective to quickly browse and search large revision controlled code bases.
When traveling with my laptop, I find myself on all manner of different wireless networks, and these diverse range of networks always seem to do seomthing funky with SMTP. To get around this I’ve resorted to sending my e-mail through an SSH tunnel. I had previously tried using both SSL and TLS as methods of securely connecting to my upstream mail server, but neither of these methods appeared to work as reliably as a good old SSH tunnel.
On my laptop I use mutt which,
unlike most GUI email clients, relies on a mail server running locally
on the machine to deliver the mail upstream. The default local mail
server on Mac OS is postfix,
which works pretty well. In postfix’s main configuration file (main.cf
),
you can set the relayhost
option to use a specific mail
server as a relay (instead of delivery mail directly form the laptop).
Usually you would set this to your ISP’s outgoing mail server. E.g:
relayhost = [smtp.example.com]
Now, instead of directly contacting smtp.example.com
, I want
to go through an SSH tunnel. So I change the relay host to:
relayhost = [127.0.0.1]:8025
I use the dotted decimal rather than localhost to avoid an DNS lookup, and
force things to use IPv4. The square brackets force it to directly use
the host, rather than performing any MX lookups. Finally the last part is the
port number. So, now rather than trying to contact smpt.example.com
the local mailserver will try to send any email to port 8025 on the localhost.
So, the next question is how to actually make it so that port 8025
is a tunnel to port 25 on the real mailserver. One option is to use
ssh’s -L
option. Something like: ssh
mail.example.org -L 8025:mail.example.org:25
. This works fine,
except that it is not very robust. The ssh session will drop out from
time-to-time, and it gets up when moving between networks. The
solution to this is to create the tunnel on-demand. Each time a
program tries access port 8025 on localhost, we set up the tunnel
then. If things were really hammering port 8025 this might be a bit of
overhead, but realistically, this isn’t a problem, and things become
much more robust and reliable.
On a traditional UNIX system you would use (x)inetd
to
automatically run a program when a socket is opened. Mac OS X instead uses
launchd
. A file, /System/Library/LaunchDaemon/mailtunnel.plist
is created, which is our config file for creating the SSH tunnel on demand. It looks
something like:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>au.id.benno.mail</string> <key>UserName</key> <string>benno</string> <key>ProgramArguments</key> <array> <string>/usr/bin/ssh</string> <string>-i</string> <string>tunnel_key</string> <string>-q</string> <string>-T</string> <string>mail.example.com</string> </array> <key>inetdCompatibility</key> <dict> <key>Wait</key> <false/> </dict> <key>Sockets</key> <dict> <key>Listeners</key> <dict> <key>SockServiceName</key> <string>8025</string> </dict> </dict> <key>SessionCreate</key> <true/> </dict> </plist>
The import bits here are the Sockets
fragment, where we say that
we are going to run the service whenever something connect to port 8025. The
other important part is the ProgramArguments
part, which is a verbose
way of specifying what command to run to create the tunnel. In this case
we run ssh -i tunnel_key -q -T mail.example-com
.
The final part is actually setting up the command to create the tunnel on demand.
First we create an ssh key-pair (using ssh-keygen
). This is going
to be a key without a passphrase so that we can ssh to our mailserver without
any interaction requiring a password to be entered. One you create the public
keypair, you copy the .pub
file across to your mail server, and
keep the private one locally. Now, ordinarily, you don’t want a passphraseless
key, because it would be very powerful and if someone got access to the key,
they would have full access to your account on the mailserver. The next trick
we play is to limit what the key can be used for. Specifically, we ensure that
the key can only be used to access port 25 on the mailserver. We do this
by setting some options when adding the public key to the list of authorized
keys on the mailserver.
The .ssh/authorized_keys2
file on the mailserver contains the list
of keys which can be used to gain access to the server via ssh. We want to add
our newly create public key to this list, but add some options to limit the scope
of what can be done when using the key. The format of the authorized_keys2
file is: <options> ssh-dss <key-data> <comment>
. So,
instead of just adding the key, we are going to put some options in place:
command="nc localhost 25",no-X11-forwarding,no-agent-forwarding,no-port-forwarding
What these options do is disable any forwarding using the key, and then specifically
set the command to run when using the key to be nc localhost 25
. nc
a.k.a netcat is a very nifty little program
that is useful for these scenarios. So with this set up, when using ssh, it will connect
to the mailserver, but you won't get a console, or be able to use any other programs. We can test this
out:
$ SSH_AUTH_SOCK= ssh -i tunnel_key mail.example.com 220 mail.example.com ESMTP quit 221 mail.example.com closing connection
The
-icommand tells ssh to use a specific key, instead of the default key in
.ssh
. The SSH_AUTH_SOCK=
is important as it disables the
use of an ssh agent that is running, which would otherwise override your key choice.
With that working, you should now be able to directly attach to the tunnel using
telnet localhost 8025
.
So, my earlier post on this was a little premature; anyone who has tried out the code has found out that it pretty much doesn’t work (hey I did warn you!). Now there are a range of fun reasons why this didn’t work, most of which I’ve now solved.
Firstly, it turns out that EABI and ARMv4T are pretty much
incompatible. (I’ll post separately about that!). In short,
thumb interworking doesn’t (can’t) work, so I’ve reverted back to
plain old ARMv4 architecture as my target (the only difference
between ARMv4 and ARMv4T is the thumb stuff, which we can’t use
until the compiler / spec is fixed.). So I’ve updated the
linux-arm.mk
to support ARMv4 for now as well.
Of course the next problem that this introduces is that the
bx
instruction doesn’t exist on ARMv4, and GCC
(helpfully) complains and stops the compilation. Now a BX
without thumb support is simply a mov pc,
instruction,
so I went through and provided a BX
macro that expands
to either bx
or mov pc,
. This is a little
bit nasty/invasive because it touches all the system call bindings,
thankfully these are generated anyway, but it makes the diff quite
large. (When I have time I’ll make it so that generation is part of
the buid system, not a manual process.)
The next problem is that the provided compiler’s
libgcc
library is build for ARMv5, and has instructions
that just don’t exist on ARMv4 (shc as clz
), so I went and built a new compiler
targeted to ARMv4. There is no reason why this couldn’t be set up as a
multi-lib compiler that supports both, but I don’t have enough GCC
wizardry in me to work that out right now. So a new compiler.
This got things to a booting stage, but not able to mount
/system
or /data
. Basically, Android by
default uses yet another flash
file-system (YAFFS), but for some reasons, which I couldn’t
fully work out initially, the filesystem just didn’t seem to cleanly
initialise and then mount. So, without diving too deep, I figured I
could just use jffs2
instead, which I know works on the
target. So I upgraded the Android build system to support allowing you
to choose which filesystem type to use, and providing jffs2 as an
option. This was going much better, and I got a lot further, far
enough that I needed to recompile my kernel with support for some of
the Android specific drivers like ashmem, binder and
logger. Unfortunately I was getting a hang on an mmap
call, for reasons that I couldn’t quite work out. After a lot of
tedious debugging (my serial console is broken, so I have to rely on
graphics console, which is really just an insane way to try and debug
anything), anyway, it turns out that part of what the Dalvik virtual
machine does when optimising class files is to mmap
the
file as writable memory. This was what was failing, with the totally
useless error invalid argument. Do you know how many unique
paths along the mmap system call can set EINVAL? Well it’s a lot. Anyway,
long story short, it turns out that the jffs2 filesystem doesn’t support writable
mmaps! %&!#.
After I finished cursing, I decided to go back to using yaffs and
working out what the real problem is. After upgrading u-boot (in a
pointless attempt to fix my serial console), I noticed a new
write yaffs[1]
command. This wasn’t there in the old
version. Ok, cool, maybe this has something do to with the
problem. But what is this the deal with yaffs versus yaffs1? Well it
turns out that NAND has different pagesize, 512 bytes, and 2k (or
multiples thereof, maybe??). And it turns out that YAFFS takes
advantage of this and has different file systems for different sized
NAND pages, and of course, everything that can go wrong will so, the
filesystem image that the build system creates is YAFFS2 which is for
2k pages not 512b pages. So, I again updated the build system to
firstly build both the mkyaffs2image
and the
mkyaffsimage
tool, and then set off building a YAFFS file
system.
Now, while u-boot supports yaffs filesystem, device firmware update
doesn’t (appear to). So this means I need to copy the image to memory
first, then on the device copy it from memory to flash. Now, the other
fun thing is that dfu can only copy 2MB or so to RAM at a time, and
the system.img
file is around 52MB or so, which means that
it takes around 26 individual copies of 2MB sections.... very, very painful.
But in the end this more or less worked. So now I have a 56MB partition
for the system, and a 4MB partition for the user and things are looking good.
Good that is, right up until the point where dalvik starts up and
writes out cached version of class files to /data
. You
see, it needs more than 4MB, a lot more, so I’m kind of back to square
one. I mean, if I’d looked at the requirements I would have read 128MB
of flash, but meh, who reads requirements? The obvious option would be
some type of MMC card, but as it turns out the number of handy Fry’s
stores on Boeing 747 from Sydney to LA number in the zeroes.
So the /system
partition is read-only, and since the
only problem with jffs2 was when we were writing to it, it seems that
we could use jffs2 for the read-only system partition, which has the
advantage of jffs2 doing compression, and fitting in about 30MB, not
about 50MB, leaving plenty of room for the user data partition, which
is where the Dalvik cached files belong. This also has the advantage
of being able to use normal DFU commands to install the image
(yay!). So after more updates to the build system to now support
individually setting the system filesystem type and the user
filesystem type things seem a lot happier.
Currently, I have a system that boots init
, starts up
most of the system services, including the Dalvik VM, runs a bunch of
code, but bombs out with an out-of-memory error in the pixelflinger
code which I’m yet to have any luck tracing. Currently my serial
console is fubar, so I can’t get any useful logging, which makes
things doubly painful. The next step is to get adb
working over USB so I have at least an output of the errors and
warning, which should give me half a chance of tracking down the
problem.
So if you want to try and get up to this point, what are the steps?
Well, firstly go and download the android
toolchain source code. and compile it for a v4 target. You use the
--target=armv4-android-eabi
argument to configure if I
remember correctly.
Once you have that done, grab my latest patch and apply it to the Android source code base. (That is tar file with diffs for each individual project, apply these correctly is left as an exercise for the reader). Then you want to compile it with the new toolchain. I use a script like this:
#!/bin/sh make TARGET_ARCH_VERSION=armv4 \ MKJFFS2_CMD="ssh nirvana -x \"cd `pwd`; mkfs.jffs2\"" \ SYSTEM_FSTYPE=jffs2 \ USERDATA_FSTYPE=yaffs \ TARGET_TOOLS_PREFIX=/opt/benno/bin/armv4-android-eabi- $@
Things you will need to change it the tools prefix, and the mkjffs2 command. The evil-hackery above is to run it on my linux virtual machine (I’m compiling the rest under OS X, and I can’t get mkfs.jffs2 to compile under it yet.)
After some time passes you should end up with a ramdisk.img, userdata.img and system.img files. The next step is to get a usable kernel.
I’m using the OpenMoko stable kernel, which is 2.6.24 based. I’ve patched this with bits of the Android kernel (enough, I think, to make it run). Make sure you configure support for yaffs, binder, logger and ashmem. Here is the kernel config I’m currently using.
At this stage it is important you have a version of u-boot supporting the yaffs write commands, if you don’t your next step is to install that. After this the next step is to re-partition your flash device. In case it isn’t obvious this will trash your current OS. The useful parts from my uboot environment are:
mtdids=nand0=neo1973-nand bootdelay=-1 mtdparts=mtdparts=neo1973-nand:256k(uboot)ro,16k(uboot-env),752k(ramdisk),2m(kernel),36m(system),24m(userdata) rdaddr=0x35000000 kaddr=0x32000000 bootcmd=setenv bootargs ${bootargs_base} ${mtdparts} initrd=${rdaddr},${rdsize}; nand read.e ${kaddr} kernel; nand read.e ${rdaddr} ramdisk; bootm ${kaddr} bootargs_base=root=/dev/ram rw console=tty0 loglevel=8
Note the mtdparts which defines the partitions, and the bootcmd. (I’m not entirely happy with the boot command, mostly because when I install new RAM image I need to manually update $rdsize, which is a pain).
With this in place you are ready to start. The first image to move across is your userdata image. Now to make this happen we first copy it into memory using dfu-util:
sudo dfu-util -a 0 -R -D source/out/target/product/generic/userdata.img -R
Then you need to use the nand write.yaffs1 command to copy it to the data partition. Note, at this stage I get weird behaviour, I’m not convinced that the yaffs support truly works yet! Afterwards I get some messed up data in other parts of the flash (which is why we are doing it first). After you have copied it in, I suggest reseting the device, and you may find you need to reinitialise u-boot (using dyngen, and resetting up the environment as above.
After this you are good to use dfu-util to copy accross the kernel, system.img and ramdisk.img. After copying the ramdisk.img across update the rdsize variable with the size of the ramdisk.
Once all this is done, you are good to boot, I wish you luck! If you have a working
serial console you can probably try the logcat
command to see why
graphics aren’t working. If you get this far please email me the results!
After a lot of stuffing around installing new hard drives so I had enough space to actually play with the source code, getting screwed by Time Machine when trying to convert my filesystem from case-insenstive to case-insensitive (I gave up and am now usuing a case-sensitive disk image on top of my case-insenstive file system.. sigh), I finally have the Android source code compiling, yay!.
Compiling is fairly trivial, just make
and away it
goes. The fun thing is trying to work out exactly what the hell the
build system is actually doing. I’ve got to admit though, it is a
pretty clean build system, although it isn’t going to win any speed
records. I’m going to go into more details on the build sstem when i
have more time, and I’ve actually worked out what the hell is
happening.
Anyway, after a few false starts I now have the build system compiling for ARMv4T processors (such as the one inside the Neo1973), and hopefully at the same time I haven’t broken compilation from ARMv5TE.
For those interested I have a patch
available. Simply apply this to the checked out code, and the build
using make TARGET_ARCH_VERSION=armv4t
. Now, of course I haven’t
actually tried to run this code yet, so it might not work, but
it seems to compile fine, so that is a good start! Now once I work out how to make
git play nice I'll actually put this into a branch and make it available, but
the diff will have to suffice for now. Of course I’m not the only one looking
at this, check out Christopher’s
page for more information. (Where he actually starts solving some problems instead of just
working around them ;)
The rest of this post documents the patch. For those interested it should give you some idea of the build system and layout, and hopefully it is something that can be applied to mainline.
The first changes made are to the linux-arm.mk
file. A new make variable
TARGET_ARCH_VERSION is added. For now this is defaulted to armv5te,
but it can be overridden on the command line as shown above.
project build/ diff --git a/core/combo/linux-arm.mk b/core/combo/linux-arm.mk index adb82d3..a43368f 100644 --- a/core/combo/linux-arm.mk +++ b/core/combo/linux-arm.mk @@ -7,6 +7,8 @@ $(combo_target)TOOLS_PREFIX := \ prebuilt/$(HOST_PREBUILT_TAG)/toolchain/arm-eabi-4.2.1/bin/arm-eabi- endif +TARGET_ARCH_VERSION ?= armv5te + $(combo_target)CC := $($(combo_target)TOOLS_PREFIX)gcc$(HOST_EXECUTABLE_SUFFIX) $(combo_target)CXX := $($(combo_target)TOOLS_PREFIX)g++$(HOST_EXECUTABLE_SUFFIX) $(combo_target)AR := $($(combo_target)TOOLS_PREFIX)ar$(HOST_EXECUTABLE_SUFFIX)
The next thing is to make the GLOBAL_CFLAGS variable dependent on the architecture
version. The armv5te defines stay in place, but an armv4t architecture version is
added. Most of the cflags are pretty similar, except we change the -march
flag, and change the pre-processor defines. These will become important later in the
patch as they provide the mechanism for distinguishing between versions in the code.
@@ -46,6 +48,7 @@ ifneq ($(wildcard $($(combo_target)CC)),) $(combo_target)LIBGCC := $(shell $($(combo_target)CC) -mthumb-interwork -print-libgcc-file-name) endif +ifeq ($(TARGET_ARCH_VERSION), armv5te) $(combo_target)GLOBAL_CFLAGS += \ -march=armv5te -mtune=xscale \ -msoft-float -fpic \ @@ -56,6 +59,21 @@ $(combo_target)GLOBAL_CFLAGS += \ -D__ARM_ARCH_5__ -D__ARM_ARCH_5T__ \ -D__ARM_ARCH_5E__ -D__ARM_ARCH_5TE__ \ -include $(call select-android-config-h,linux-arm) +else +ifeq ($(TARGET_ARCH_VERSION), armv4t) +$(combo_target)GLOBAL_CFLAGS += \ + -march=armv4t \ + -msoft-float -fpic \ + -mthumb-interwork \ + -ffunction-sections \ + -funwind-tables \ + -fstack-protector \ + -D__ARM_ARCH_4__ -D__ARM_ARCH_4T__ \ + -include $(call select-android-config-h,linux-arm) +else +$(error Unknown TARGET_ARCH_VERSION=$(TARGET_ARCH_VERSION)) +endif +endif $(combo_target)GLOBAL_CPPFLAGS += -fvisibility-inlines-hidden
The next bit we update is the prelink-linux-arm.map
file.
The dynamic libraries in android are laid out explicitly in virtual memory
according to this map file. If I’m not mistaken those address look suspiciously
1MB aligned, which means they should fit nicely in the pagetable, and provides
some opportunity to use fast-address-space-switching techniques. In the port
to ARMv4 I have so far been lazy and instead of fixing up any assembler code
I’ve just gone with existing C code. One outcome of this is that I need the
libffi.so for my foreign function interface, so I’ve added this to the map
for now. I’m not 100% sure that when compiling for ARMv5 this won’t cause
a problem. Will need to see. Fixing up the code to avoid needing libffi
is probably high on the list of things to do.
diff --git a/core/prelink-linux-arm.map b/core/prelink-linux-arm.map index d4ebf43..6e0bc43 100644 --- a/core/prelink-linux-arm.map +++ b/core/prelink-linux-arm.map @@ -113,3 +113,4 @@ libctest.so 0x9A700000 libUAPI_jni.so 0x9A500000 librpc.so 0x9A400000 libtrace_test.so 0x9A300000 +libffi.so 0x9A200000
The next module is the bionic module which is the light-weight C library that is part of Android. This has some nice optimised routines for memory copy and compare, but unfortunately they rely on ARMv5 instructions. I’ve changed the build system to only use the optimised assembler when compiling with ARMv5TE, and falling back to C routines in the other cases. (The strlen implementation isn’t pure assembly, but the optimised C implementation has inline asm, so again it needs to drop back to plain old dumb strlen.)
project bionic/ diff --git a/libc/Android.mk b/libc/Android.mk index faca333..3fb3455 100644 --- a/libc/Android.mk +++ b/libc/Android.mk @@ -206,13 +206,9 @@ libc_common_src_files := \ arch-arm/bionic/_setjmp.S \ arch-arm/bionic/atomics_arm.S \ arch-arm/bionic/clone.S \ - arch-arm/bionic/memcmp.S \ - arch-arm/bionic/memcmp16.S \ - arch-arm/bionic/memcpy.S \ arch-arm/bionic/memset.S \ arch-arm/bionic/setjmp.S \ arch-arm/bionic/sigsetjmp.S \ - arch-arm/bionic/strlen.c.arm \ arch-arm/bionic/syscall.S \ arch-arm/bionic/kill.S \ arch-arm/bionic/tkill.S \ @@ -274,6 +270,18 @@ libc_common_src_files := \ netbsd/nameser/ns_print.c \ netbsd/nameser/ns_samedomain.c + +ifeq ($(TARGET_ARCH),arm) +ifeq ($(TARGET_ARCH_VERSION),armv5te) +libc_common_src_files += arch-arm/bionic/memcmp.S \ + arch-arm/bionic/memcmp16.S \ + arch-arm/bionic/memcpy.S \ + arch-arm/bionic/strlen.c.arm +else +libc_common_src_files += string/memcmp.c string/memcpy.c string/strlen.c string/ffs.c +endif +endif + # These files need to be arm so that gdbserver # can set breakpoints in them without messing # up any thumb code.
Unfortunately, it is clear that this C only code hasn’t been used in a while as there was a trivial bug as fixed by the patch below. This makes me worry about what other bugs that aren’t caught by the compiler may be lurking.
diff --git a/libc/string/memcpy.c b/libc/string/memcpy.c index 4cd4a80..dea78b2 100644 --- a/libc/string/memcpy.c +++ b/libc/string/memcpy.c @@ -25,5 +25,5 @@ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ -#define MEM_COPY +#define MEMCOPY #include "bcopy.c"
Finally, frustratingly, the compiler’s ffs()
implementation
appears to fallback to calling the C library’s ffs()
implementation
if it can’t doing something optimised. This happens when compiling for ARMv4, so I’ve
added an ffs() implementation (stolen from FreeBSD).
#include#include /* * Find First Set bit */ int ffs(int mask) { int bit; if (mask == 0) return (0); for (bit = 1; !(mask & 1); bit++) mask = (unsigned int)mask >> 1; return (bit); }
The next module for attention is the dalvik virtual machine. Again this has some code that relies on ARMv5, but there is a C version that we fall back on. In this case it also means pulling in libffi. This is probably the module that needs to most attention in actually updating the code to be ARMv4 assembler in the near future.
project dalvik/ diff --git a/vm/Android.mk b/vm/Android.mk index dfed78d..c66a861 100644 --- a/vm/Android.mk +++ b/vm/Android.mk @@ -189,6 +189,7 @@ ifeq ($(TARGET_SIMULATOR),true) endif ifeq ($(TARGET_ARCH),arm) +ifeq ($(TARGET_ARCH_VERSION),armv5te) # use custom version rather than FFI #LOCAL_SRC_FILES += arch/arm/CallC.c LOCAL_SRC_FILES += arch/arm/CallOldABI.S arch/arm/CallEABI.S @@ -204,6 +205,16 @@ else mterp/out/InterpC-desktop.c \ mterp/out/InterpAsm-desktop.S LOCAL_SHARED_LIBRARIES += libffi + LOCAL_SHARED_LIBRARIES += libdl +endif +else + # use FFI + LOCAL_C_INCLUDES += external/libffi/$(TARGET_OS)-$(TARGET_ARCH) + LOCAL_SRC_FILES += arch/generic/Call.c + LOCAL_SRC_FILES += \ + mterp/out/InterpC-desktop.c \ + mterp/out/InterpAsm-desktop.S + LOCAL_SHARED_LIBRARIES += libffi endif LOCAL_MODULE := libdvm
Next is libjpeg, which again, has assembler optimisation that we can’t easily use without real porting work, so we fall back to the C
project external/jpeg/ diff --git a/Android.mk b/Android.mk index 9cfe4f6..3c052cd 100644 --- a/Android.mk +++ b/Android.mk @@ -19,6 +19,12 @@ ifneq ($(TARGET_ARCH),arm) ANDROID_JPEG_NO_ASSEMBLER := true endif +# the assembler doesn't work for armv4t +ifeq ($(TARGET_ARCH_VERSION),armv4t) +ANDROID_JPEG_NO_ASSEMBLER := true +endif + + # temp fix until we understand why this broke cnn.com #ANDROID_JPEG_NO_ASSEMBLER := true
For some reason compiling with ARMv4 doesn’t allow the prefetch loop array compiler optimisation, so we turn it off for ARMv4.
@@ -29,7 +35,10 @@ LOCAL_SRC_FILES += jidctint.c jidctfst.S endif LOCAL_CFLAGS += -DAVOID_TABLES -LOCAL_CFLAGS += -O3 -fstrict-aliasing -fprefetch-loop-arrays +LOCAL_CFLAGS += -O3 -fstrict-aliasing +ifeq ($(TARGET_ARCH_VERSION),armv5te) +LOCAL_FLAGS += -fprefetch-loop-arrays +endif #LOCAL_CFLAGS += -march=armv6j LOCAL_MODULE:= libjpeg
Next up is libffi, which is just a case of turning it on since we now need it for ARMv4.
project external/libffi/ diff --git a/Android.mk b/Android.mk index f4452c9..07b5c2f 100644 --- a/Android.mk +++ b/Android.mk @@ -6,7 +6,7 @@ # We need to generate the appropriate defines and select the right set of # source files for the OS and architecture. -ifneq ($(TARGET_ARCH),arm) +ifneq ($(TARGET_ARCH_VERSION),armv5te) LOCAL_PATH:= $(call my-dir) include $(CLEAR_VARS)
The external module opencore contains a lot of software implemented codecs. (I wonder about the licensing restrictions on these things...). Not surprisingly these too are tuned for ARMv4, but again we fall back to plain old C.
project external/opencore/ diff --git a/codecs_v2/audio/aac/dec/Android.mk b/codecs_v2/audio/aac/dec/Android.mk index ffe0089..6abdc2d 100644 --- a/codecs_v2/audio/aac/dec/Android.mk +++ b/codecs_v2/audio/aac/dec/Android.mk @@ -150,7 +150,7 @@ LOCAL_SRC_FILES := \ LOCAL_MODULE := libpv_aac_dec LOCAL_CFLAGS := -DAAC_PLUS -DHQ_SBR -DPARAMETRICSTEREO $(PV_CFLAGS) -ifeq ($(TARGET_ARCH),arm) +ifeq ($(TARGET_ARCH_VERSION),armv5te) LOCAL_CFLAGS += -D_ARM_GCC else LOCAL_CFLAGS += -DC_EQUIVALENT diff --git a/codecs_v2/audio/gsm_amr/amr_wb/dec/Android.mk b/codecs_v2/audio/gsm_amr/amr_wb/dec/Android.mk index e184178..3223841 100644 --- a/codecs_v2/audio/gsm_amr/amr_wb/dec/Android.mk +++ b/codecs_v2/audio/gsm_amr/amr_wb/dec/Android.mk @@ -48,7 +48,7 @@ LOCAL_SRC_FILES := \ LOCAL_MODULE := libpvamrwbdecoder LOCAL_CFLAGS := $(PV_CFLAGS) -ifeq ($(TARGET_ARCH),arm) +ifeq ($(TARGET_ARCH_VERSION),armv5te) LOCAL_CFLAGS += -D_ARM_GCC else LOCAL_CFLAGS += -DC_EQUIVALENT diff --git a/codecs_v2/audio/mp3/dec/Android.mk b/codecs_v2/audio/mp3/dec/Android.mk index 254cb6b..c2430fe 100644 --- a/codecs_v2/audio/mp3/dec/Android.mk +++ b/codecs_v2/audio/mp3/dec/Android.mk @@ -28,8 +28,8 @@ LOCAL_SRC_FILES := \ src/pvmp3_seek_synch.cpp \ src/pvmp3_stereo_proc.cpp \ src/pvmp3_reorder.cpp - -ifeq ($(TARGET_ARCH),arm) + +ifeq ($(TARGET_ARCH_VERSION),armv5te) LOCAL_SRC_FILES += \ src/asm/pvmp3_polyphase_filter_window_gcc.s \ src/asm/pvmp3_mdct_18_gcc.s \ @@ -46,7 +46,7 @@ endif LOCAL_MODULE := libpvmp3 LOCAL_CFLAGS := $(PV_CFLAGS) -ifeq ($(TARGET_ARCH),arm) +ifeq ($(TARGET_ARCH_VERSION),armv5te) LOCAL_CFLAGS += -DPV_ARM_GCC else LOCAL_CFLAGS += -DC_EQUIVALENT
Unfortunately it is not just the build file that needs updating in this module. I need to manually go and update the headers so that some optimised inline assembler is only used in the ARMv5 case. To be honest this messes these files up a little bit, so a nicer solution would be preferred.
diff --git a/codecs_v2/video/m4v_h263/enc/src/dct_inline.h b/codecs_v2/video/m4v_h263/enc/src/dct_inline.h index 86474b2..41a3297 100644 --- a/codecs_v2/video/m4v_h263/enc/src/dct_inline.h +++ b/codecs_v2/video/m4v_h263/enc/src/dct_inline.h @@ -22,7 +22,7 @@ #ifndef _DCT_INLINE_H_ #define _DCT_INLINE_H_ -#if !defined(PV_ARM_GCC)&& defined(__arm__) +#if !(defined(PV_ARM_GCC) && defined(__arm__) && defined(__ARCH_ARM_5TE__)) #include "oscl_base_macros.h" @@ -109,7 +109,7 @@ __inline int32 sum_abs(int32 k0, int32 k1, int32 k2, int32 k3, #elif defined(__CC_ARM) /* only work with arm v5 */ #if defined(__TARGET_ARCH_5TE) - +#error __inline int32 mla724(int32 op1, int32 op2, int32 op3) { int32 out; @@ -266,7 +266,7 @@ __inline int32 sum_abs(int32 k0, int32 k1, int32 k2, int32 k3, return abs_sum; } -#elif defined(PV_ARM_GCC) && defined(__arm__) /* ARM GNU COMPILER */ +#elif defined(PV_ARM_GCC) && defined(__arm__) && defined(__ARCH_ARM_5TE__) /* ARM GNU COMPILER */ __inline int32 mla724(int32 op1, int32 op2, int32 op3) { diff --git a/codecs_v2/video/m4v_h263/enc/src/fastquant_inline.h b/codecs_v2/video/m4v_h263/enc/src/fastquant_inline.h index 6a35d43..fbfeddf 100644 --- a/codecs_v2/video/m4v_h263/enc/src/fastquant_inline.h +++ b/codecs_v2/video/m4v_h263/enc/src/fastquant_inline.h @@ -25,7 +25,7 @@ #include "mp4def.h" #include "oscl_base_macros.h" -#if !defined(PV_ARM_GCC) && defined(__arm__) /* ARM GNU COMPILER */ +#if !(defined(PV_ARM_GCC) && defined(__arm__) && defined(__ARCH_ARM_V5TE__)) /* ARM GNU COMPILER */ __inline int32 aan_scale(int32 q_value, int32 coeff, int32 round, int32 QPdiv2) { @@ -423,7 +423,7 @@ __inline int32 coeff_dequant_mpeg_intra(int32 q_value, int32 tmp) return q_value; } -#elif defined(PV_ARM_GCC) && defined(__arm__) /* ARM GNU COMPILER */ +#elif defined(PV_ARM_GCC) && defined(__arm__) && defined(__ARCH_ARM_V5TE__) /* ARM GNU COMPILER */ __inline int32 aan_scale(int32 q_value, int32 coeff, int32 round, int32 QPdiv2) diff --git a/codecs_v2/video/m4v_h263/enc/src/vlc_encode_inline.h b/codecs_v2/video/m4v_h263/enc/src/vlc_encode_inline.h index 69857f3..b0bf46d 100644 --- a/codecs_v2/video/m4v_h263/enc/src/vlc_encode_inline.h +++ b/codecs_v2/video/m4v_h263/enc/src/vlc_encode_inline.h @@ -18,7 +18,7 @@ #ifndef _VLC_ENCODE_INLINE_H_ #define _VLC_ENCODE_INLINE_H_ -#if !defined(PV_ARM_GCC)&& defined(__arm__) +#if !(defined(PV_ARM_GCC) && defined(__arm__) && defined(__ARCH_ARM_V5TE__)) __inline Int zero_run_search(UInt *bitmapzz, Short *dataBlock, RunLevelBlock *RLB, Int nc) { @@ -208,7 +208,7 @@ __inline Int zero_run_search(UInt *bitmapzz, Short *dataBlock, RunLevelBlock *R return idx; } -#elif defined(PV_ARM_GCC) && defined(__arm__) /* ARM GNU COMPILER */ +#elif defined(PV_ARM_GCC) && defined(__arm__) && defined(__ARCH_ARM_V5TE__) /* ARM GNU COMPILER */ __inline Int m4v_enc_clz(UInt temp) {
A similar approach is needed in the skia graphics library.
project external/skia/ diff --git a/include/corecg/SkMath.h b/include/corecg/SkMath.h index 76cf279..5f0264f 100644 --- a/include/corecg/SkMath.h +++ b/include/corecg/SkMath.h @@ -162,7 +162,7 @@ static inline int SkNextLog2(uint32_t value) { With this requirement, we can generate faster instructions on some architectures. */ -#if defined(__arm__) && !defined(__thumb__) +#if defined(__arm__) && defined(__ARM_ARCH_5TE__) && !defined(__thumb__) static inline int32_t SkMulS16(S16CPU x, S16CPU y) { SkASSERT((int16_t)x == x); SkASSERT((int16_t)y == y);
The sonivox module (no idea what that is!), has the same requirement of updating the build to avoid building ARMv5 specific code.
project external/sonivox/ diff --git a/arm-wt-22k/Android.mk b/arm-wt-22k/Android.mk index 565c233..a59f917 100644 --- a/arm-wt-22k/Android.mk +++ b/arm-wt-22k/Android.mk @@ -73,6 +73,7 @@ LOCAL_COPY_HEADERS := \ host_src/eas_reverb.h ifeq ($(TARGET_ARCH),arm) +ifeq (($TARGET_ARCH),armv5) LOCAL_SRC_FILES+= \ lib_src/ARM-E_filter_gnu.s \ lib_src/ARM-E_interpolate_loop_gnu.s \
The low-level audio code in audioflinger suffers from the same optimisations, and we need to dive into the code on this occasion to fix things up.
project frameworks/base/ diff --git a/libs/audioflinger/AudioMixer.cpp b/libs/audioflinger/AudioMixer.cpp index 9f1b17f..4c0890c 100644 --- a/libs/audioflinger/AudioMixer.cpp +++ b/libs/audioflinger/AudioMixer.cpp @@ -400,7 +400,7 @@ void AudioMixer::process__validate(state_t* state, void* output) static inline int32_t mulAdd(int16_t in, int16_t v, int32_t a) { -#if defined(__arm__) && !defined(__thumb__) +#if defined(__arm__) && defined(__ARCH_ARM_5TE__) && !defined(__thumb__) int32_t out; asm( "smlabb %[out], %[in], %[v], %[a] \n" : [out]"=r"(out) @@ -415,7 +415,7 @@ int32_t mulAdd(int16_t in, int16_t v, int32_t a) static inline int32_t mul(int16_t in, int16_t v) { -#if defined(__arm__) && !defined(__thumb__) +#if defined(__arm__) && defined(__ARCH_ARM_5TE__) && !defined(__thumb__) int32_t out; asm( "smulbb %[out], %[in], %[v] \n" : [out]"=r"(out) @@ -430,7 +430,7 @@ int32_t mul(int16_t in, int16_t v) static inline int32_t mulAddRL(int left, uint32_t inRL, uint32_t vRL, int32_t a) { -#if defined(__arm__) && !defined(__thumb__) +#if defined(__arm__) && defined(__ARCH_ARM_5TE__) && !defined(__thumb__) int32_t out; if (left) { asm( "smlabb %[out], %[inRL], %[vRL], %[a] \n" @@ -456,7 +456,7 @@ int32_t mulAddRL(int left, uint32_t inRL, uint32_t vRL, int32_t a) static inline int32_t mulRL(int left, uint32_t inRL, uint32_t vRL) { -#if defined(__arm__) && !defined(__thumb__) +#if defined(__arm__) && defined(__ARCH_ARM_5TE__) && !defined(__thumb__) int32_t out; if (left) { asm( "smulbb %[out], %[inRL], %[vRL] \n" diff --git a/libs/audioflinger/AudioResamplerSinc.cpp b/libs/audioflinger/AudioResamplerSinc.cpp index e710d16..88b8c22 100644 --- a/libs/audioflinger/AudioResamplerSinc.cpp +++ b/libs/audioflinger/AudioResamplerSinc.cpp @@ -62,7 +62,7 @@ const int32_t AudioResamplerSinc::mFirCoefsDown[] = { static inline int32_t mulRL(int left, int32_t in, uint32_t vRL) { -#if defined(__arm__) && !defined(__thumb__) +#if defined(__arm__) && defined(__ARCH_ARM_5TE__) && !defined(__thumb__) int32_t out; if (left) { asm( "smultb %[out], %[in], %[vRL] \n" @@ -88,7 +88,7 @@ int32_t mulRL(int left, int32_t in, uint32_t vRL) static inline int32_t mulAdd(int16_t in, int32_t v, int32_t a) { -#if defined(__arm__) && !defined(__thumb__) +#if defined(__arm__) && defined(__ARCH_ARM_5TE__) && !defined(__thumb__) int32_t out; asm( "smlawb %[out], %[v], %[in], %[a] \n" : [out]"=r"(out) @@ -103,7 +103,7 @@ int32_t mulAdd(int16_t in, int32_t v, int32_t a) static inline int32_t mulAddRL(int left, uint32_t inRL, int32_t v, int32_t a) { -#if defined(__arm__) && !defined(__thumb__) +#if defined(__arm__) && defined(__ARCH_ARM_5TE__) && !defined(__thumb__) int32_t out; if (left) { asm( "smlawb %[out], %[v], %[inRL], %[a] \n"
The AndroidConfig.h header file is included on every compile. We mess with it to convince it that we don’t have an optimised memcmp16 function.
project system/core/ diff --git a/include/arch/linux-arm/AndroidConfig.h b/include/arch/linux-arm/AndroidConfig.h index d7e182a..76f424e 100644 --- a/include/arch/linux-arm/AndroidConfig.h +++ b/include/arch/linux-arm/AndroidConfig.h @@ -249,8 +249,9 @@ /* * Do we have __memcmp16()? */ +#if defined(__ARCH_ARM_5TE__) #define HAVE__MEMCMP16 1 - +#endif /* * type for the third argument to mincore(). */
Next up is the pixelflinger, where things get interesting, because all of a sudden we have armv6 code. I’ve taken the rash decision of wrapping this in conditionals that are only enabled if you actually have an ARMv6 version, not a pesky ARMv5E, but I really need to better understand the intent here. It seems a little strange.
diff --git a/libpixelflinger/Android.mk b/libpixelflinger/Android.mk index a8e5ee4..077cf47 100644 --- a/libpixelflinger/Android.mk +++ b/libpixelflinger/Android.mk @@ -5,7 +5,7 @@ include $(CLEAR_VARS) # ARMv6 specific objects # -ifeq ($(TARGET_ARCH),arm) +ifeq ($(TARGET_ARCH_VERSION),armv6) LOCAL_ASFLAGS := -march=armv6 LOCAL_SRC_FILES := rotate90CW_4x4_16v6.S LOCAL_MODULE := libpixelflinger_armv6 @@ -39,7 +39,7 @@ PIXELFLINGER_SRC_FILES:= \ raster.cpp \ buffer.cpp -ifeq ($(TARGET_ARCH),arm) +ifeq ($(TARGET_ARCH_VERSION),armv5te) PIXELFLINGER_SRC_FILES += t32cb16blend.S endif @@ -67,7 +67,7 @@ ifneq ($(BUILD_TINY_ANDROID),true) LOCAL_MODULE:= libpixelflinger LOCAL_SRC_FILES := $(PIXELFLINGER_SRC_FILES) LOCAL_CFLAGS := $(PIXELFLINGER_CFLAGS) -DWITH_LIB_HARDWARE -ifeq ($(TARGET_ARCH),arm) +ifeq ($(TARGET_ARCH_VERSION),armv6) LOCAL_WHOLE_STATIC_LIBRARIES := libpixelflinger_armv6 endif include $(BUILD_SHARED_LIBRARY)
Finally scanline has an optimised asm version it calls in preference to doing the same thing inline with C code. Again, I take the easy way out, and use the C code.
diff --git a/libpixelflinger/scanline.cpp b/libpixelflinger/scanline.cpp index d24c988..685a3b7 100644 --- a/libpixelflinger/scanline.cpp +++ b/libpixelflinger/scanline.cpp @@ -1312,7 +1312,7 @@ void scanline_t32cb16blend(context_t* c) const int32_t v = (c->state.texture[0].shade.it0>>16) + y; uint32_t *src = reinterpret_cast(tex->data)+(u+(tex->stride*v)); -#if ((ANDROID_CODEGEN >= ANDROID_CODEGEN_ASM) && defined(__arm__)) +#if ((ANDROID_CODEGEN >= ANDROID_CODEGEN_ASM) && defined(__arm__) && defined(__ARCH_ARM_5TE__)) scanline_t32cb16blend_arm(dst, src, ct); #else while (ct--) {
And that my friends, is that! Now to see if I can actually run this code!
So last week I posted on the difference between trusted and trustworthy code. Picking up that thread again, if there is some code in my system that is trusted how can I tell if it is actually trustworthy?
Now ruling out truly malicious code, our assessment of the trustworthiness of code, really comes down to an assessment of the quality of the code, and a pretty reasonable proxy for code quality is the number of errors in the code, or more specifically, the lack thereof. So the question is how can we determine whether a piece of code has a small number of bugs?
The number of errors in a body of code is going to be the product
of two important factors: the defect density and the size of the
code. So, in general the defect density is measured in bugs per
thousand lines of code (KLOC), and the size of the code is measured in
lines of code. Now there are plenty of arguments on what the
”right“ method is for measuring lines of code, and in general
you can only know the exact defect density after all the bugs are found,
and of course, program testing can be used to show the presence of
bugs, but never to show their absence!
*. So, to lower the number of
bugs that exist in a code base there are really two options: reduce
the number of lines of code, or improve the defect density.
So, how do we improve, that is reduce, the defect density. Well, there are a number of pretty well known ways. Effective testing, despite its caveats, goes a long way to reducing the number of bugs in a code base (assuming that you actually fix the bugs you find!). Static analysis (in its various forms), is also an important tool to help increase the quality of code, and is often a great complement to testing as it can expose bugs in code that is impractical to test. And of course there are other formal methods like model checking which can help eliminate bugs from the design phase. A great example of this is the SPIN model checker. Code reviews, while time intensive, are also a great way of finding bugs that would otherwise fly under the radar. Another way to improve code quality is to write simple, rather than complex code. McCabe’s cyclomatic complexity measure can be one good indicator of this. There is, of course, just a sampling of some of the aspects of software quality. Wikipedia and McConnell’s Code Complete for more information.
Now, how do you know if some code base has actually undergone testing, how do you know if static analysis has been performed on the source code? How do you know if the code has under gone a thorough code review? Well, generally there are two ways, you trust the developer of that code, or you get someone you do trust to do the quality analysis (which may be yourself). Now, is the point where things quickly shift from technological solutions into the fuzzy world of economics, social science, psychology and legal theory as we try to determine the trustworthiness of another entity, be it person or corporation. The one technologically relevant part is that it is much more difficult to do a 3rd party analysis of code quality without access to the source code. Note: this I am not saying that open source software is more trustworthy, simply that making the source available enables 3rd party assessments of code quality, which may make it easier for some people to trust the code.
So, improving code quality, and thereby reducing defect density, is one side of the equation, but even if you have a very low defect density, for example, less than 1/KLOC, you can still have a large number of bugs if your code base is large. So it is very important to reduce the size of the code base as well. A small code base doesn’t just have the direct benefit of reducing the code size part of the equation, it also helps improve the defect density part of the equation. Why? Well, almost all the techniques mentioned above are more effective or tractable on a small code base. You can usually get much better code coverage, and even a reasonable amount of path coverage, with a smaller code base. Code reviews can be more comprehensive. Now to some extent those techniques can work for large code bases, just through more programmers at it, but using static analysis is another matter. Many of the algorithms and techniques involved with static analysis have polynomial, or even exponential computational complexity with n based on number of lines of code. So an analysis that may take an hour on a 10,000 line code base, could end up taking all week to run on a code base of 100,000 lines of code.
Of course, this doesn’t address the problem of how you assure that the code you think is in your TCB is really what you think it is. That topic really gets us into trusted boot, trusted protection module, code signing and so on, which I’m not going to try and address in this post.
Now, it should be very clear that if you want to be able to trust your trusted computing base, then it is going to need to be both small and high quality.