I'm really interested in finding out what is happening underneath
the sheets on the Android platform. The first step to this is getting
access to the Android shell. There are two options here, you can either
start the emulator with a console option: $ emulator console,
or connect to the console after book using: $ adb shell. A convention
that I use through this document is that thing to type into you host shell use
the '$' prompt, and things to type into the Android console use '#'.
Apology: This information isn't particularly well structured, and has mostly been written as a log of my initial explorations. At some time in the future I might actually order this in a sane way.
So our toolbox of useful POSIX commands is a little more limited than
usual, but the basics of ls, ps, cat, echo are there. So
we start by seeing what programs we can run. First we do an # echo $PATH,
which shows we can expect useful programs in /sbin, /system/sbin and
/system/bin/. So lets look a little closer. # ls /sbin, shows us:
-rwxr-xr-x root root 226940 1970-01-01 00:00 recovery -rwxr-xr-x root root 102864 1970-01-01 00:00 adbd
recovery seems to be some kind of recovery tool for
when you are screwed things up a bit too much. Running recovery from
the build directory seems to actual output text on the LCD screen,
specifically: W: Can't open version file "/etc/build.prop": No
such file or directory. There isn't a build.prop
file in /etc, but there is one in /system,
it would be interesting to see what happens when recovery does find a
file. Unfortunately, we don't actually have cp, but we
can use adb, to do our dirty work: $ adb pull
/system/build.prop .; adb push build.prop /etc. Running
recovery now doesn't actually do anything very useful, so much for that.
I presume that it is meant to be run by some process on the host, so
we'll ignore this avenue of enquiry for now.
adbd, seems pretty obvious, it looks like the daemon
that adb talks to. If we run # ps, we can see
that the adbd process is running.
It turns out there is no /system/sbin directory, so we
are left looking at what is in /system/bin, which seems
to be where the bulk of the files are, around 100 executables.
-rwxr-xr-x root root 4280 2007-11-11 20:57 AudioHardwareRecord -rwxr-xr-x root root 4152 2007-11-11 20:57 AudioInRecord -rwxr-xr-x root root 4672 2007-11-11 20:57 RecursiveMutexTest -rwxr-xr-x root root 20932 2007-11-11 20:57 SRecTest -rwxr-xr-x root root 20408 2007-11-11 20:57 SRecTestAudio -rwxr-xr-x root root 4860 2007-11-11 20:57 UAPI_PortabilityTest -rwxr-xr-x root root 12420 2007-11-11 20:57 UAPI_SrecTest -rwxr-xr-x root root 29560 2007-11-11 20:57 UAPI_test -rwxr-xr-x root root 196 2007-11-11 20:46 am -rwxr-xr-x root root 5124 2007-11-11 20:57 app_process lrwxr-xr-x root root 2007-11-11 20:57 cat -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 chmod -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 cmp -> toolbox -rwxr-xr-x root root 3912 2007-11-11 20:57 crasher -rwxr-xr-x root root 4892 2007-11-11 20:57 dalvikvm lrwxr-xr-x root root 2007-11-11 20:57 date -> toolbox -rwxr-xr-x root root 118096 2007-11-11 20:57 dbus-daemon lrwxr-xr-x root root 2007-11-11 20:57 dd -> toolbox -rwxr-xr-x root root 9264 2007-11-11 20:57 debuggerd -rwxr-xr-x root root 17416 2007-11-11 20:57 dexdump -rwxr-xr-x root root 3944 2007-11-11 20:57 dexopt lrwxr-xr-x root root 2007-11-11 20:57 df -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 dmesg -> toolbox -rwxr-xr-x root root 73748 2007-11-11 20:57 drm1_unit_test -rwxr-xr-x root root 85608 2007-11-11 20:57 drm2_unit_test -rwxr-xr-x root root 1240 2007-11-11 20:46 dumpstate -rwxr-xr-x root root 6808 2007-11-11 20:57 dumpsys lrwxr-xr-x root root 2007-11-11 20:57 exists -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 getevent -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 getprop -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 hd -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 ifconfig -> toolbox -rwxr-xr-x root root 208 2007-11-11 20:46 input lrwxr-xr-x root root 2007-11-11 20:57 insmod -> toolbox -rwxr-xr-x root root 6696 2007-11-11 20:57 install_boot_image lrwxr-xr-x root root 2007-11-11 20:57 ioctl -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 kill -> toolbox -rwxr-xr-x root root 81236 2007-11-11 20:57 linker lrwxr-xr-x root root 2007-11-11 20:57 ln -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 log -> toolbox -rwxr-xr-x root root 8772 2007-11-11 20:57 logcat lrwxr-xr-x root root 2007-11-11 20:57 ls -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 lsmod -> toolbox -rwxr-xr-x root root 7596 2007-11-11 20:57 mem_profiler lrwxr-xr-x root root 2007-11-11 20:57 mkdir -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 mkdosfs -> toolbox -rwxr-xr-x root root 212 2007-11-11 20:46 monkey lrwxr-xr-x root root 2007-11-11 20:57 mount -> toolbox -rwxr-xr-x root root 2888 2007-11-11 20:57 mtptest -rwxr-xr-x root root 8640 2007-11-11 20:57 netcfg lrwxr-xr-x root root 2007-11-11 20:57 netstat -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 notify -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 ping -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 powerd -> toolbox -rwxr-xr-x root root 144356 2007-11-11 20:57 pppd lrwxr-xr-x root root 2007-11-11 20:57 printenv -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 ps -> toolbox -rwxr-xr-x root root 8724 2007-11-11 20:58 pv lrwxr-xr-x root root 2007-11-11 20:57 r -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 readtty -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 reboot -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 renice -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 resetradio -> toolbox -rwxr-xr-x root root 4116 2007-11-11 20:57 rild lrwxr-xr-x root root 2007-11-11 20:57 rm -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 rmdir -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 rmmod -> toolbox -rwxr-xr-x root root 1027 2007-11-11 20:47 ro.xml -rwxr-xr-x root root 1782 2007-11-11 20:47 ro2.xml -rwxr-xr-x root root 98 2007-11-11 20:47 roerror.xml lrwxr-xr-x root root 2007-11-11 20:57 rotatefb -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 route -> toolbox -rwxr-xr-x root root 45484 2007-11-11 20:57 runtime -rwxr-xr-x root root 4572 2007-11-11 20:57 sdutil lrwxr-xr-x root root 2007-11-11 20:57 sendevent -> toolbox -rwxr-xr-x root root 6148 2007-11-11 20:57 service lrwxr-xr-x root root 2007-11-11 20:57 setconsole -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 setkey -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 setprop -> toolbox -rwxr-xr-x root root 90344 2007-11-11 20:57 sh -rwxr-xr-x root root 5116 2007-11-11 20:57 showmap -rwxr-xr-x root root 7524 2007-11-11 20:57 showslab lrwxr-xr-x root root 2007-11-11 20:57 sleep -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 smd -> toolbox -rwxr-xr-x root root 25944 2007-11-11 20:57 sqlite3 -rwxr-xr-x root root 411 2007-11-11 20:47 ssltest lrwxr-xr-x root root 2007-11-11 20:57 start -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 stop -> toolbox -rwsr-sr-x root root 72436 2007-11-11 20:57 su lrwxr-xr-x root root 2007-11-11 20:57 sync -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 syren -> toolbox -rwxr-xr-x root root 2772 2007-11-11 20:57 system_server -rwxr-xr-x root root 88052 2007-11-11 20:57 toolbox lrwxr-xr-x root root 2007-11-11 20:57 umount -> toolbox -rwxr-xr-x root root 16612 2007-11-11 20:57 usbd lrwxr-xr-x root root 2007-11-11 20:57 watchprops -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 webgrab -> toolbox lrwxr-xr-x root root 2007-11-11 20:57 wipe -> toolbox
We can use # mount to get an idea of what is happening.
rootfs / rootfs rw 0 0 /dev/pts /dev/pts devpts rw 0 0 /proc /proc proc rw 0 0 /sys /sys sysfs rw 0 0 /dev/block/mtdblock0 /system yaffs2 rw,nodev,noatime,nodiratime 0 0 /dev/block/mtdblock1 /data yaffs2 rw,nodev,noatime,nodiratime 0 0
The root filesystem is
using rootfs,
which basically means it is whatever what in the initial-ramfs. It is
interesting, because there is no tmpfs, which means an
application writing to /tmp, could exhaust system
memory. This doesn't seem like a good denial-of-service to leave
open. Of course the root-filesystem is no good for storing files,
since it has no backing store.
The initial-ramfs is loaded by the emulator
from $SDKROOT/tools/lib/images/ramdisk.img. This is
simply a gzipped, CPIO archive. So, if you wanted to modify it, for
some reason, it wouldn't be too hard to do. Inspecting it there isn't
too much interesting there though, most of the interesting stuff is on
the /system and /data partitions.
/system is
a yaffs2 filesystem. It is using
the first nand chip as its backing store. # df shows us
that there is a 64MB backing store, and it is about half used. This
directory is where the core of the Android system is stored. It seems
as though a normal user shouldn't ever need to update it, but I'm sure
over-the-air updates would be used on this file system.
In the emulator the system image is located
at $SDKROOT/tools/lib/image/system.img, although you should
be able to change that by using the -image parameter when
running the emulator. The emulator does not write-back changes to the system
file-system, so it will be fresh, each time the emulator is loaded.
The /data file system is very similar to /system;
it is backed by 64MB nand, and is using the yaffs2. The is the filesystem on which
new user applications are installed and run from. Changes made to the /data
filesystem are honoured by the emulator, the updated nand image is stored in ~/.android/userdata.img.
To get an idea of what is actually executing on the phone we use
trusty # ps. Which gives us:
USER PID PPID VSIZE RSS WCHAN PC NAME root 1 0 252 164 c0082240 0000ab0c S /init root 2 0 0 0 c0048eac 00000000 S kthreadd root 3 2 0 0 c003acf0 00000000 S ksoftirqd/0 root 4 2 0 0 c0045e5c 00000000 S events/0 root 5 2 0 0 c0045e5c 00000000 S khelper root 8 2 0 0 c0045e5c 00000000 S suspend/0 root 33 2 0 0 c0045e5c 00000000 S kblockd/0 root 36 2 0 0 c0045e5c 00000000 S cqueue/0 root 38 2 0 0 c0150c44 00000000 S kseriod root 75 2 0 0 c005bed0 00000000 S pdflush root 76 2 0 0 c005bed0 00000000 S pdflush root 77 2 0 0 c005f880 00000000 S kswapd0 root 78 2 0 0 c0045e5c 00000000 S aio/0 root 201 2 0 0 c014e2f4 00000000 S mtdblockd root 217 2 0 0 c0045e5c 00000000 S kmmcd root 231 2 0 0 c0045e5c 00000000 S rpciod/0 root 450 1 720 288 c00386a4 afe092ac S /system/bin/sh root 451 1 3316 128 ffffffff 0000ceb4 S /sbin/adbd root 452 1 2816 284 ffffffff afe08b9c S /system/bin/usbd root 453 1 636 216 c017c114 afe08e9c S /system/bin/debuggerd root 454 1 12576 584 ffffffff afe08b9c S /system/bin/rild root 455 1 56572 14608 c01dc388 afe083dc S zygote root 456 1 20576 2076 ffffffff afe0861c S /system/bin/runtime bluetooth 458 1 1200 760 c0082240 afe0947c S /system/bin/dbus-daemon root 467 455 93632 17284 ffffffff afe0861c S system_server app_4 505 455 76104 13632 ffffffff afe09604 S com.google.android.home phone 509 455 75876 14952 ffffffff afe09604 S com.google.android.phone app_2 523 455 74528 14332 ffffffff afe09604 S com.google.process.content root 538 450 932 312 00000000 afe083dc R ps
Unfortunately we don't have pstree, so we don't get a
nice hierarchal display, but things aren't very complicated so we can
work it out fairly easily. First we can prune off the kernel threads,
and we get a more manageable set of things left. If we then look at
those processes directly parented by init, we are left
with:
USER PID PPID VSIZE RSS WCHAN PC NAME root 450 1 720 288 c00386a4 afe092ac S /system/bin/sh root 451 1 3316 128 ffffffff 0000ceb4 S /sbin/adbd root 452 1 2816 284 ffffffff afe08b9c S /system/bin/usbd root 453 1 636 216 c017c114 afe08e9c S /system/bin/debuggerd root 454 1 12576 584 ffffffff afe08b9c S /system/bin/rild root 455 1 56572 14608 c01dc388 afe083dc S zygote root 456 1 20576 2076 ffffffff afe0861c S /system/bin/runtime bluetooth 458 1 1200 760 c0082240 afe0947c S /system/bin/dbus-daemon
/system/bin/sh, is clearly the shell we are using. We have already
explored adbd earlier. usbd, is an interesting. It seems
that most of the interaction between the client debugger happens over the USB port,
and is handled by the USB daemon. There is a usbd.conf file in
/etc. I'm not quite sure how this is working though, as there does not
appear to be any USB devices in the system (# cat /proc/devices).
debuggerd, is presumably the client side of the system debugger, although
I can't find out more details than that. rild is the radio-interface-link
daemon (I think! ;). The init files imply that you can connect a real-modem and
get Android to us it, but by default it has the simulated Java radio-interface.
zygote appears to get the launcher for the Java runtime, it isn't an
actual program, it looks like app_process, masquerading as zygote.
User-land starts with the init program. This doesn't appear to be a
standard init by any shot. At startup init parses /etc/init.rc,
which has the overall format of:
## Global environment setup
##
env {
PATH /sbin:/system/sbin:/system/bin
LD_LIBRARY_PATH /system/lib
ANDROID_BOOTLOGO 1
ANDROID_ROOT /system
ANDROID_ASSETS /system/app
ANDROID_DATA /EXTERNAL
data_STORAGE /sdcard
DRM_CONTENT /data/drm/content
}
## Setup and initialization we need to do on boot
##
onboot {
setkey 0x0 0xe5 0x706
....
# bring up loopback
ifup lo
...
}
## Processes which are run once (and only once) at system boot.
## Init will wait until each of these completes before starting
## the next one.
##
startup {
usbd-config {
exec /system/bin/usbd
args {
0 -c
}
}
qemu-init {
exec /etc/qemu-init.sh
}
}
## Processes which init runs and will re-run upon exit,
## until it is told to stop running them. These will not be
## run until all the processes in startup have been run
##
service {
console {
exec /system/bin/sh
console yes
}
adbd {
exec /sbin/adbd
}
......
}
I've no idea of a general purpose init that uses that file format,
but really, I don't know that much. It looks custom, although seems to
have familiar constructs to something like
Apple's launchd. strings doesn't provide
much more information, but it seems someone was frustrated enough
during development to see the need for the string Stupid C library hack !!.
Well, that is a bit of an overview. I'm sure there will be more to come as I work out more stuff.
Last night Open Kernel Labs won an iAward in the Application and Infrastructure Tools category.
It is quite cool that I can now say our microkernel is now award winning! I think a lot of this is really due the awesome engineers working here.
Running with the explain things for Benno 5 months from now, here is how I'm currently making my backups work. The basic thing I want is incremental backups onto an external hard drive.
rsync makes this really
easy to do. The latest version of rsync has --link-dest,
which will hardlink to files in an existing directory tree, rather than
creating a new file. This makes each incremental reasonably cheap, as
most files are just hard links back to the previous backup.
So basically I do:
sudo rsync -a --link-dest=../backups.0/ /Users/ backups.1/.
Note that the trailing slash is actually important here.
A big gotcha is ensuring that the external disk is actually mounted with permissions enabled, or else the hard-links won't actual be made.
After this everything works nicely, when combined with a simple script:
#!/usr/bin/env python
"""
Use rsync to backup my /Users directory onto external hard drive
backup.
"""
import os
def get_latest():
backups = [int(x.split(".")[-1]) for x in os.listdir(".") if x.startswith("backup")]
backups.sort()
return backups[-1]
os.chdir("/Volumes/backup2007")
latest = get_latest()
next = latest + 1
command = "sudo rsync -a --link-dest=../backups.%d/ /Users/ backups.%d/" % (latest, next)
os.system(command)
Currently I don't do any encryption, which might be useful in the future.
Mail on UNIX is weird, I spent a few hours this week tracking down
some bugs in my mail setup, and in the process learnt a lot more about
how things interact. I'm documenting it here for Benno 5 months from
now and anyone else that cares. There is this pseudo-standard of mail
user agents (MUAs) not actually talking SMTP to a mail submission
agent (MSA), but instead simply hoping that
/usr/sbin/sendmail exists. Things are so bound this way
that you really can't avoid having a /usr/sbin/sendmail
on your machine, so as a result other mail transport agents (MTAs)
have to provide a binary that works in the same way as sendmail.
Unfortunately as far as I can tell there is no standard about what
command line flags a sendmail compatible program should accept, and
appears to come down to common usage.
In some ways this is quite backwards to what I would have expected, which is that an MTA would run on the localhost and programs would talk SMTP to port 25 (or the MSA port 587), then all communications is through this documented standard. On the other hand, this means every program that wants to send e-mail must have TCP and SMTP libraries compiled which is against the UNIX philosophy.
Now I am actually quite interested to find out what other programs (that I use), actually rely on sendmail. I'm guessing (hoping) that it is mostly just MUAs such as mutt and mailx, but I really wonder what else is out there relying on sendmail.
More importantly, what command lines arguments do these different
programs expect /usr/sbin/sendmail to handle correctly.
(In case I wanted to join the hundreds of others and say, write my own
sendmail replacement). So after putting in a simple python program to
log the command line arguments my great findings are:
mailx uses the -i argument. This one is
great, by default a line with a single '.' character is treated as the
end of input, with the argument standard end of file, means end of
file. mutt on the other hand uses -oi
(which is the same as -i, and
-oem. -oem is conveniently undocumented in
Postifx's sendmail man page, but on consulting the exim sendmail man
page, discovered that this basically means, the command can't fail,
and any errors should be reported through email, rather than with an
error return code.
mutt lets you override the command it uses for mail submission by
setting the sendmail variable. This is handy to know if you
want to add extra arguments to the sendmail command line. For example
a handy feature is being able to control the envelope from address
used. This is done with the -f command line argument.
Next up for my adventures in the wonderful world of UNIX email
is to setup my on sendmail compatible program, that can set the
envelope address and smarthost to use based on the e-mail From:
header.
Want to help run the best open source conference in world? Want to get involved with the local open source community? Then linux.conf.au wants you!
We are currently after volunteers interested in helping run linux.conf.au 2007.
What types of things will you be helping out on?
What's in it for you?
Here's what you do
If you have any questions please email seven-contact@lca2007.linux.org.au
I'm currently try to remerge two versions of a source file that have diverged quite a long way. This is quite a painful process, and a tool like mgdiff makes it a lot easier to see diffs than just the normal command line diff.
Unfortunately, mgdiff has one essential missing feature; the ability to update the diff as you are editing the underlying files. So with a bit of help from Erik, a cooked up this patch which adds the feature. Hopefully this will hit the upstream package soon.
Reading Jeff's humorous post
got me digging into exactly why Python would be running in a select
loop when otherwise idle.
It basically comes down to some code in the module that wraps GNU readline. The code is:
while (!has_input)
{ struct timeval timeout = {0, 100000}; /* 0.1 seconds */
FD_SET(fileno(rl_instream), &selectset);
/* select resets selectset if no input was available */
has_input = select(fileno(rl_instream) + 1, &selectset,
NULL, NULL, &timeout);
if(PyOS_InputHook) PyOS_InputHook();
}
So what this is basically doing, is trying to read data but 10 times a second
calling out to PyOS_InputHook(), which is a hook that can be used by C
extension (in particular Tk) to process something when python is otherwise idle.
Now the slightly silly thing is that it will wake up every 100 ms, even if PyOS_InputHook is not actually set. So a slight change:
while (!has_input)
{ struct timeval timeout = {0, 100000}; /* 0.1 seconds */
FD_SET(fileno(rl_instream), &selectset);
/* select resets selectset if no input was available */
if (PyOS_InputHook) {
has_input = select(fileno(rl_instream) + 1, &selectset,
NULL, NULL, &timeout);
} else {
has_input = select(fileno(rl_instream) + 1, &selectset,
NULL, NULL, NULL);
}
if(PyOS_InputHook) PyOS_InputHook();
}
With this change Python is definitely ready for the enterprise!
One of the problems with pyannodex was that you could only iterate through a list of clips once. That is in something like:
anx = annodex.Reader(sys.argv[1])
for clip in anx.clips:
print clip.start
for clip in anx.clips:
print clip.start
Only the first loop would print anything. This is basically because
clips returned an iterator, and once the iterator had run through once, it
didn't reset the iterator. I had originally (in 0.7.3.1) solved this in
a really stupid way, whereby I used class properties to recreate an
iterator object each time a clip object was returned. This was obviously
sily and I fixed it properly by reseting the file handle in the __iter__
method of my iterator object.
When reviewing this code I also found a nasty bug. The way my iterator worked relied on each read call causing it most one callback, which wasn't actually what was happening. Luckily this is also fixable by having callback functions return ANX_STOP_OK, rather than ANX_CONTINUE. Any way, there is now a new version up which fixes these problems.
Jamie asked me about using splint in his SCons based projects such as filtergen. So the basic way to use SCons and splint together is through the power of pre-actions. Basically this allows you to specify a command to be run before a particular build rule is used. So I added something like this:
for object_file in filtergen_objects:
env.AddPreAction(object_file, env["SPLINTCOM"])
to the filtergen main SConstruct file. The worst part about this is you now need to explicitly have a list of source objects around to pass to your Program builder. This is a bit annoying but not the end of the world. the SPLINTCOM command is setup using hte SCons command interpolation like so:
env["SPLINT"] = "splint"
env["SPLINTFLAGS"] = ["-weak", "+posixlib", "-I."]
env["SPLINTCOM"] = Action("$SPLINT $CPPFLAGS $SPLINTFLAGS $SOURCES")
Currently I'm splint-ing filtergen at the weak level, and it passes, but to really use splint Jamie will need to start using splint at the standard level and fix some stuff up.
When using splint, instead of parsing the standard system headers it uses its own which are firstly splint-clean themselves, and also add annotations to describe memory usage. Unfortunately these headers don't define all the standard functions found in headers, in particular network related ones, so you end up with some ugly code in the source files to deal with this. If I have time I'll try and update the standard set of headers to be more full featured.
I also found that filtergen doesn't really compile very well on non-GNU platforms, but that I'm not about to try and fix that in a clean way. For Jamie's future reference problems involve:
For those that care, which is mostly Jamie, the patches can be merged from my bzr branch: http://benno.id.au/bzr/filtergen.splint
So I recently stumbled across Jester, which is a mutation testing tool.Mutation testing is really a way of testing your tests. The basic idea is:
So in testing your code you should first aim for complete coverage, that is ensure that your test cases exercise each line of code. Once you get here, then mutation testing can help you find other holes in your test cases.
So the interesting thing here is what mutation operations (or mutagens), you should do on the original code. Jester provides only some very simple mutations, (at least according to this paper):
Unfortunately it seems that such an approach is, well, way too simplistic, at least according Jeff Offutt, who has published papers in the area. Basically, any rudimentary testing finds these changes, and doesn't really find the holes in your test suite.
A paper on Mothra, a mutation system for FORTRAN, describes a large number of different mutation operations. (As a coincident, my friend has a bug tracking system called Mothra.)
Anyway, so while Jester has a python port, I didn't quite like its approach (especially since I needed to install Java, which I didn't really feel like doing). So I decided to explore how this could be done in Python.
So the first set of mutation I wanted to play with is changing some constants
to different values. E.g: change 3 to (1 or 2). So this turns out to be reasonably
easy to do in python. I took functions as the unit of test I wanted to play with.
So, Python makes it easy to introspect function. You can get a list of
conants on a function like so: function.func_code.co_consts. Now,
you can't actually modify this, but what you can do is make a new
copy of the method with a different set of constants. This is conveniant because
in mutation testing we want to create mutants. So:
def f_new_consts(f, newconsts):
"""Return a copy of function f, but change its constants to newconsts."""
co = f.func_code
codeobj = type(co)(co.co_argcount, co.co_nlocals, co.co_stacksize,
co.co_flags, co.co_code, tuple(newconsts), co.co_names,
co.co_varnames, co.co_filename, co.co_name,
co.co_firstlineno, co.co_lnotab, co.co_freevars,
co.co_cellvars)
new_function = type(f)(codeobj, f.func_globals, f.func_name, f.func_defaults,
f.func_closure)
return new_function
So that is the basic mechanism I'm using for creating mutant function with
different constants. The other mutants I wanted to make where those where
I change comparison operators. E.g: changing < to <= or >. This is a bit trickier,
it means getting down and dirty with the python byte code, or at least this is my
preferred approach. I guess you could also do syntactic changes and recompile.
Anyway, you can get a string representation of a function's bytecode like so:
function.func_code.co_code. To do something useful with this you
really want convert it to a list of integers, which is easy as so:
[ord(x) for x in function.func_code.co_code].
So far, so good, what you need to be able to do next is iterate through the different bytecode, this is a little tricky as they can be either 1 or 3 bytes long. A loop like so:
from opcode import opname, HAVE_ARGUMENT
i = 0
opcodes = [ord(x) for x in co.co_code]
while i < len(opcodes):
print opname[opcode]
i += 1
if opcode >= HAVE_ARGUMENT:
i+= 2
will iterate through the opcodes and printing each one. Basically all bytecodes larger than the constant HAVE_ARGUMENT, are three bytes long.
From here it is possible to find all the COMPARE_OP
byte codes. The first argument byte specifies the type of compare operations,
these can be decoded using the opcode.cmp_op tuple.
So, at this point I have some basic proof-of-concept code for creating some very simple mutants. The next step is to try and integrate this with my unit testing and coverage testing framework, and then my code will never have bugs in it again!