I’m really impressed with the way the HTML5 spec is going, and the fact that it is quickly going to become the default choice for portable application development.
One of the lastest additions to help support application
development is the File
API. This API enables a developer to gain access to the contents
of files locally. The main new data structure that a developer if
provided with is a FileList objects which represents an
array of File objects. FileList objects
can be obtained from two places; input form elements
and from drag & drop DataTransfer objects.
Based on this latest API, I’ve created a simple library, JsJpegMeta for parsing Jpeg meta data.
I’ve hacked together a example that
demonstrates the library. Just select a JPEG file from the form, or
drag a JPEG file onto the window. For large JPEG files you might need
to be a little bit patient, as it can be a little slow. This slowness, suprisingly,
doesn’t appear to be the Javascript part, but rather Firefox’s handling of large
data: URLs and JPEG display in general.
The rest of this post goes into some of the details. Unfortunately only Firefox 3.6 supports these new APIs right now.
Here is an example of how to get access to a FileList.
When the user chooses a file, it calls the Javascript function
loadFiles. (Assuming you have already defined that
function).
<form id="form" action="javascript:void(0)">
<p>Choose file: <input type="file" onchange="loadFiles(this.files)" /></p>
</form>
A File object just provides a reference to a file; to
actually get some data out of the file you need to use a
FileReader object. The FileReader object
provides an asynchronous API for reading the file data into
memory. Three different methods are provided by the
FileReader object; readAsBinaryString,
readAsText and readAsDataURL. A callback,
onloadend, is executed when the file has been read into
memory, the data is then available via the result field.
Here example of what the loadFiles function might look like:
function loadFiles(files) {
var binary_reader = new FileReader();
binary_reader.file = files[0];
binary_reader.onloadend = function() {
alert("Loaded file: " + this.file.name + " length: " + this.result.length);
}
binary_reader.readAsBinaryString(files[0]);
$("form").reset();
}
Note the $("form").reset(); clears the input form.
Forms are not the only way to get a FileList, you can
also get files from drag and drop
event. You need to handle three events; dragenter, dragover and drop.
<body ondragenter="dragEnterHandler(event)" ondragover="dragOverHandler(event)" ondrop="dropHandler(event)">
The default handling of these are fairly striaght forward:
function dragEnterHandler(e) { e.preventDefault(); }
function dragOverHandler(e) { e.preventDefault(); }
function dropHandler(e) {
e.preventDefault();
loadFiles(e.dataTransfer.files);
}
The interesting thing here is the readAsBinaryString,
when this method is used result ends up being a
binary string. This is pretty new because, as far
as I know, there hasn’t really been a good way to access binary data
in Javascript before. Each character in the binary string represents a
byte, and has a character code in the range [0..255].
This is great, because it means that we can parse binary strings locally, without having to upload files to a server for processing. Unfortunately there isn’t a great deal of support for handling binary data in Javacript; there isn’t anything like Python’s struct module.
Luckily it isn’t too hard to write something close to this. Mostly we wanted to parse unsigned and signed integers of arbitrary length. To be useful, we need to handle both little and big endianess. A very simple implementation of parsing an unsigned integer is:
function parseNum(endian, data, offset, size) {
var i;
var ret;
var big_endian = (endian === ">");
if (offset === undefined) offset = 0;
if (size === undefined) size = data.length - offset;
for (big_endian ? i = offset : i = offset + size - 1;
big_endian ? i < offset + size : i >= offset;
big_endian ? i++ : i--) {
ret <<= 8;
ret += data.charCodeAt(i);
}
return ret;
}
endian specifies the endianess; the string literal ">"
for big-endian and "<" for little-endian. (Copying the Python struct
module). data is the binary data to parse. An
offset can be specified to enable parsing from the middle
of a binary structure; this defaults to zero. The size of
the integer to parse can also be specified; it defaults to the
remainder of the string.
Signed integers require a little bit more work. Although there are
multiple ways of representing
signed numbers, by far the most common is the two’s
complement method. A function that has the same inputs as parseNum
is:
function parseSnum(endian, data, offset, size) {
var i;
var ret;
var neg;
var big_endian = (endian === ">");
if (offset === undefined) offset = 0;
if (size === undefined) size = data.length - offset;
for (big_endian ? i = offset : i = offset + size - 1;
big_endian ? i < offset + size : i >= offset;
big_endian ? i++ : i--) {
if (neg === undefined) {
/* Negative if top bit is set */
neg = (data.charCodeAt(i) & 0x80) === 0x80;
}
ret <<= 8;
/* If it is negative we invert the bits */
ret += neg ? ~data.charCodeAt(i) & 0xff: data.charCodeAt(i);
}
if (neg) {
/* If it is negative we do two's complement */
ret += 1;
ret *= -1;
}
return ret;
}
JpegMeta is a
simple, pure Javascript library for parsing Jpeg meta-data. To use it
include the jpegmeta.js file. This creates a single,
global, module object JpegMeta. The JpegMeta
module object has one public interface of use, the JpegFile
class. You can use this to construct new JpegFile class instances. The input
is a binary string (for example as returned from a FileReader object.
An example is:
var jpeg = new JpegMeta.JpegFile(this.result, this.file.name);
After creation you can then access various meta-data properties, categorised by meta-data groups. The main groups of meta-data are:
Meta-data groups can be access directly, for example:
var group = jpeg.gps;
A lookup table is also provided: jpeg.metaGroups. This
associative array can be used to determine which meta-groups a
particular jpeg file instance actually has.
The MetaGroup object has a name field, a description field and
an associative array of properties.
Properties in a given group can be accessed directly. E.g:
var lat = jpeg.gps.latitude;
Alternatively, the metaProps associative array provides
can be used to determine which properties are available.
The metaProp object has a name field,
description field, and also a value
field.
The File API adds a poweful new capability to native HTML5 applications.