Javascript is a very permissive language; you can go messing around of the innards of other classes to your heart’s content. Of course the question is should you?.
Currently I’m playing around with the channel
messaging feature in HTML5. In a nutshell, this API lets you
create communication MessageChannel
, which have two
MessagePort object associated with it. When you send a
message on port1 an event is triggered on port2 (and vice-versa). You send a message
by calling the postMessage
method on a port object. E.g:
port.postMessage("Hello world");
This opens up a range of interesting possibilities for web developers, but this blog post is about software design, not cool HTML5 features.
Unfortunately the postMessage
is very new, and the implementation
has not yet caught up with the specification. Although you are meant to be
able to transfer objects using postMessage
currently only strings
are supported, and any other objects are coerced into strings. This has an
interesting side-effect. If we have code such as:
port.postMessage({"op" : "test"});
the receiver of the message ends up with the string
[Object object]
, which is mostly
useless; actually it is completely useless. So, we want to transfer
structured data over a pipe that just supports standard strings, sounds
like a job for JSON.
So, now my code ends up looking like:
port.postMessage(JSON.stringify({"op" : "test"});
Now this is all good, but it gets a bit tedious having to type that out
every time I want to post a message, so a naïve approach we can simple create
a new function, postObject
:
function postObject(port, object) { return port.postMessage(JSON.stringify(object)); } postObject(port, {"op" : "test"});
OK, so this works, and it is pretty simple to use, but there
aesthetic here that makes this grate just a bit, why do I have to do
postObject(port, object)
, why can’t I do
port.postObject(object)
, that is more “object-oriented”,
so, thankfully JavaScript lets us monkey patch objects
at run time. So, if we do this:
port.postObject = function (object) { return this.postMessage(JSON.stringify(object)); } port.postObject({"op" : "test"});
OK, so far so good. What problems does this have? Well, firstly we are creating a new function for each port object, which isn’t great for either execution time, or memory usage if we have a large number of ports. So instead we could do:
function postObject(object) { return this.postMessage(JSON.stringify(object)); } port.postObject = postObject; port.postObject({"op" : "test"});
This works well, except we have two problems. The
postObject
function ends up as a global function, and if
you called it as a global function, the this
parameter
would be the global, window object, rather than a port object, which
would be an easy mistake to make, and difficult to
debug. Additionally, we end up with additional per-object data for
storing the pointer. Thankfully javascript has a powerful mechanism
for solving both the problems: the prototype
object (not to be confused with the javascript framework of the
same name).
So, if we update the prototype object, instead of object directly, we don’t need to add the function to the global namespace, we avoid per-object memory usage, and we avoid extra code having to remember to set it for every port object:
MessagePort.prototype.postObject = function postObject(object) { return this.postMessage(JSON.stringify(object)); } port.postObject({"op" : "test"});
Now, this ends up working pretty well. The real question is
should be do this? Or rather which of these options should we
choose? There is an argument to be made that monkey-patching anything
is just plain wrong, and it should be avoided. For example rather than
extending the base MessagePort
class, we could create a
sub-class (Exactly how you create sub-classes in JavaScript is another
matter!).
Unfortunately sub-classing doesn’t get us to far, as we are not the
ones directly creating the MessagePort
instance, the
MessageChannel
construct does this for us. (I guess we
could monkey-patch MessageChannel
, but that defeats the
purpose of avoiding monkey-patching!).
Of course, another option would be to create a class that encapsulates a port, taking one as a constructor. E.g:
function MyPort(port) { this.port = port; } MyPort.prototype.postObject = function postObject(object) { this.port.postMessage(JSON.stringify(object)); } port = MyPort(port); port.postObject({"op" : "test"});
Of course, this means we have to remember to wrap all our port
objects in this MyPort
class. This is kind of messy (in
my opinion). Also we can no longer call the standard port methods. Of
course, we could create wrappers for all these methods too, but then
things are getting quite verbose, and we are stuffed if it comes to
inspecting properties.
Unlike some other object-oriented languages, Javascript provides another alternative, we could change the class (i.e: prototype) of the object at runtime. E.g:
function MyPort(port) { } MyPort.deriveFrom(MessagePort); /* Assume this is how we create sub-classes */ MyPort.prototype.postObject = function (object) { this.portMessage(JSON.stringify(object)); } port.__proto__ = MyPort.prototype; /* Change class at runtime */ port.postObject({"op" : "test"});
This solves most of the problems of the encapsulted approach, but
we still have to remember to adapt the object, and aditionally, __proto__
is a non-standard extension.
OK, so after quickly looking at the sub-classing approaches I think it is fair to discount them. We are still left with trying to determine if any of the monkey-patching approaches is better than a simple function call.
So, there is mostly a consesus out there that monkey patching the base object is verboten, but what about other objects?
Well, if you are in you own code, I think it is a case of anything
goes, but what if we are providing reusable code modules for other
people? (Of course, even in your own code, there might be libraries
that you include that are affected by your overloading). When base
objects start working in weird and wonderful ways just because you
import a module debugging becomes quite painful. So I think
changing the underlying implementation (like the example does
when monkey-patching the postMessage
method) should
probably be avoided.
OK, so now the choices are down to function vs. add a method to the
built-in class’s prototype. So if we just add a new global function we
could be conflicting with any libraries that also name a global
function in the same way. If we add a method to the prototype, at
least we are limiting of messing with the namespace to just the
MessagePort
object; but really, both the options aren’t
ideal.
The accepted way to get around this problem is to create a module specific namespace. This reduces the number of potential conflicts. E.g:
var Benno = {}; Benno.postObject = function postObject(port, object) { return port.postMessage(JSON.stringify(object)); } Benno.postObject(port, {"op" : "test"});
Now, this avoids polluting the global namespace (except for the
single Benno
object). So, it would have to come out above
the prototype extension approach. Now, we should consider if it is possible
to play any namespace tricks with the prototype approach. It might be nice
to think we could do something like:
MessagePort.prototype.Benno = {} MessagePort.prototype.Benno.postObject = function postObject(object) { return this.postMessage(JSON.stringify(object)); } port.Benno.postObject(object);
but this doesn’t work because of the way in which methods and the
this
object work. this
in the function ends
up referring to the Benno
module object, rather than the
MessagePort
object.
Even assuming this did work, the function approach has some additional benefits. If the user wants to reduce the typing they can do something like:
var $B = Benno; $B.postObject(port, object);
or even,
var $P = Benno.postObject; $P(port, object);
The other advantage of this scheme is that for someone debugging
the code it should be much more obvious where to look for the code and
to understand what is happening. If you were reading code and saw
Benno.postObject(port, object)
, it would be much more
obvious where the code came from, and where to start looking to debug
things.
So, in conclusion, the best approach is also the simplest: just
write a function. (But put it in a decent namespace first). Sure
instance.operation(args)
looks nicer than
operation(instance, args)
, but in the end the ability
to namespace the function, along with the advantage of making a clear
distinction between built-in and added functionality means that
the latter solution wins to day in my eyes.
If you have some other ideas on this I’d love to hear them, so please drop me an e-mail. Thanks to Nicholas for his insights here.
I’m really impressed with the way the HTML5 spec is going, and the fact that it is quickly going to become the default choice for portable application development.
One of the lastest additions to help support application
development is the File
API. This API enables a developer to gain access to the contents
of files locally. The main new data structure that a developer if
provided with is a FileList
objects which represents an
array of File
objects. FileList
objects
can be obtained from two places; input
form elements
and from drag & drop DataTransfer
objects.
Based on this latest API, I’ve created a simple library, JsJpegMeta for parsing Jpeg meta data.
I’ve hacked together a example that
demonstrates the library. Just select a JPEG file from the form, or
drag a JPEG file onto the window. For large JPEG files you might need
to be a little bit patient, as it can be a little slow. This slowness, suprisingly,
doesn’t appear to be the Javascript part, but rather Firefox’s handling of large
data:
URLs and JPEG display in general.
The rest of this post goes into some of the details. Unfortunately only Firefox 3.6 supports these new APIs right now.
Here is an example of how to get access to a FileList
.
When the user chooses a file, it calls the Javascript function
loadFiles
. (Assuming you have already defined that
function).
<form id="form" action="javascript:void(0)"> <p>Choose file: <input type="file" onchange="loadFiles(this.files)" /></p> </form>
A File
object just provides a reference to a file; to
actually get some data out of the file you need to use a
FileReader
object. The FileReader
object
provides an asynchronous API for reading the file data into
memory. Three different methods are provided by the
FileReader
object; readAsBinaryString
,
readAsText
and readAsDataURL
. A callback,
onloadend
, is executed when the file has been read into
memory, the data is then available via the result
field.
Here example of what the loadFiles
function might look like:
function loadFiles(files) { var binary_reader = new FileReader(); binary_reader.file = files[0]; binary_reader.onloadend = function() { alert("Loaded file: " + this.file.name + " length: " + this.result.length); } binary_reader.readAsBinaryString(files[0]); $("form").reset(); }
Note the $("form").reset();
clears the input form.
Forms are not the only way to get a FileList
, you can
also get files from drag and drop
event. You need to handle three events; dragenter
, dragover
and drop
.
<body ondragenter="dragEnterHandler(event)" ondragover="dragOverHandler(event)" ondrop="dropHandler(event)">
The default handling of these are fairly striaght forward:
function dragEnterHandler(e) { e.preventDefault(); } function dragOverHandler(e) { e.preventDefault(); } function dropHandler(e) { e.preventDefault(); loadFiles(e.dataTransfer.files); }
The interesting thing here is the readAsBinaryString
,
when this method is used result
ends up being a
binary string. This is pretty new because, as far
as I know, there hasn’t really been a good way to access binary data
in Javascript before. Each character in the binary string represents a
byte, and has a character code in the range [0..255].
This is great, because it means that we can parse binary strings locally, without having to upload files to a server for processing. Unfortunately there isn’t a great deal of support for handling binary data in Javacript; there isn’t anything like Python’s struct module.
Luckily it isn’t too hard to write something close to this. Mostly we wanted to parse unsigned and signed integers of arbitrary length. To be useful, we need to handle both little and big endianess. A very simple implementation of parsing an unsigned integer is:
function parseNum(endian, data, offset, size) { var i; var ret; var big_endian = (endian === ">"); if (offset === undefined) offset = 0; if (size === undefined) size = data.length - offset; for (big_endian ? i = offset : i = offset + size - 1; big_endian ? i < offset + size : i >= offset; big_endian ? i++ : i--) { ret <<= 8; ret += data.charCodeAt(i); } return ret; }
endian
specifies the endianess; the string literal ">"
for big-endian and "<" for little-endian. (Copying the Python struct
module). data
is the binary data to parse. An
offset
can be specified to enable parsing from the middle
of a binary structure; this defaults to zero. The size
of
the integer to parse can also be specified; it defaults to the
remainder of the string.
Signed integers require a little bit more work. Although there are
multiple ways of representing
signed numbers, by far the most common is the two’s
complement method. A function that has the same inputs as parseNum
is:
function parseSnum(endian, data, offset, size) { var i; var ret; var neg; var big_endian = (endian === ">"); if (offset === undefined) offset = 0; if (size === undefined) size = data.length - offset; for (big_endian ? i = offset : i = offset + size - 1; big_endian ? i < offset + size : i >= offset; big_endian ? i++ : i--) { if (neg === undefined) { /* Negative if top bit is set */ neg = (data.charCodeAt(i) & 0x80) === 0x80; } ret <<= 8; /* If it is negative we invert the bits */ ret += neg ? ~data.charCodeAt(i) & 0xff: data.charCodeAt(i); } if (neg) { /* If it is negative we do two's complement */ ret += 1; ret *= -1; } return ret; }
JpegMeta is a
simple, pure Javascript library for parsing Jpeg meta-data. To use it
include the jpegmeta.js
file. This creates a single,
global, module object JpegMeta
. The JpegMeta
module object has one public interface of use, the JpegFile
class. You can use this to construct new JpegFile class instances. The input
is a binary string (for example as returned from a FileReader
object.
An example is:
var jpeg = new JpegMeta.JpegFile(this.result, this.file.name);
After creation you can then access various meta-data properties, categorised by meta-data groups. The main groups of meta-data are:
Meta-data groups can be access directly, for example:
var group = jpeg.gps;
A lookup table is also provided: jpeg.metaGroups
. This
associative array can be used to determine which meta-groups a
particular jpeg file instance actually has.
The MetaGroup
object has a name field, a description field and
an associative array of properties.
Properties in a given group can be accessed directly. E.g:
var lat = jpeg.gps.latitude;
Alternatively, the metaProps
associative array provides
can be used to determine which properties are available.
The metaProp
object has a name
field,
description
field, and also a value
field.
The File API adds a poweful new capability to native HTML5 applications.