tl;dr
Last time I wrote about some of the idiosyncrasies in the way in which you deal with exceptions in node.js. This time, I’m looking at a phenomenon I’m calling semi-asynchronous functions.
Let’s start with a simple asynchronous function. We have a function
x which sets the value of two global variables. Of course
global variables are bad, so you could imagine that x is
a method and it is updating some fields on the current object if it makes
you feel better. Of course some will argue that any mutable state is bad, but
now we are getting side-tracked!
var a = 0
var b = 0
function x(new_a, new_b) {
a = new_a
b = new_b
}
So, here was have a pretty simple function, and it is pretty easy to state the
post-condition that we expect, specifically that when x returns
a will have the value of the first argument and b will
have the value of the second argument.
So, let’s just write some code to quickly test our expectations:
x(5, 6) console.log(a, b)
As expected this will print 5 6 to the console.
Now, if x is changed to be an asynchronous function things
get a little bit more interesting. We’ll make x asynchronous
by doing the work on the next tick:
function x(new_a, new_b, callback) {
function doIt() {
a = new_a
b = new_b
callback()
}
process.nextTick(doIt)
}
Now, we can guarantee something about the values of a and b when the callback is executed, but what about immediately after calling? Well, with this particular implementation, we can guarantee that a and b will be unchanged.
function done() {
console.log("Done", a, b)
}
x(5, 6, done)
console.log("Called", a, b)
Running this we see that our expectations hold. a and b are 0 after
x is called, but are 5 and 6 by the time the callback
is executed.
Of course, another valid implementation of x could really
mess up some of these assumptions. We could instead implement it like so:
function x(new_a, new_b, callback) {
a = new_a
function doIt() {
b = new_b
callback()
}
process.nextTick(doIt)
}
Now we get quite a different result. After x is called
a has been modified, but b remains unchanged. This is what I call a
semi-asynchronous asynchronous function; part of the work is
done synchronously, while the remainder happens some time later.
Just in case you are thinking at this point that this is slightly academic, there are real functions in the node.js library that are implemented in this semi-asynchronous fashion.
Now as a caller, faced with this semi-asynchronous functions, how exactly should you use it? If it is clearly documented which parts happen asynchronously and which parts happen synchronously and that is part of the interface, then it is relatively simple, however most functions are not documented this way, so we can only make assumptions.
If we are conservative, then we really need to assume that anything modified by the function must be in an undefined state until the callback is executed. Hopefully the documentation makes it clear what is being mutated so we don’t have to assume the state of the entire program is undefined.
Put another way, after calling x we should not rely on
the values a and b in anyway, and the implementer of x
should feel free to change when in the program flow a and/or b is
updated.
So can we rely on anything? Well, it might be nice to rely on the
order in which some code is executed. With both the implementation
of x so far, we have been able to guarantee that the
code immediately following the function executes before the asynchronous
callback executes. Well, that would be nice, but what if x
is implemented like so:
function x(new_a, new_b, callback) {
a = new_a
b = new_b
callback()
}
In this case, the callback will be executed before the
code following the call to x. So, there are two questions
to think about. Is the current formulation of x a valid
approach? And secondly, is it valid to rely on the code ordering?
While you think about that, let me introduce another interesting
issue. Let’s say we want to execute x many times in
series (i.e: don’t start the next x operation until the
previous one has finished, i.e: it has executed the callback.). Well,
of course, you can’t just use something as simple as a for loop that would
be far too easy, and it would be difficult to prove how cool you are at
programming if you could just use a for loop. No instead, you need to do something like this:
var length = 100000;
function repeater(i) {
if( i < length ) {
x(i, i, function(){
repeater(i + 1)
})
}
}
repeater(0)
This appears to be the most widely used approach. Well there
is at least one blog post
about this technique, and it has been tied up into a nice library. Now, this
works great with our original implementations of x. But try it with the latest
one (i.e: the one that does the callback immediately). What happens? Stack overflow
happens:
node.js:134
throw e; // process.nextTick error, or 'error' event on first tick
^
RangeError: Maximum call stack size exceeded
So now the question isn’t just about whether the code ordering is a reasonable assumption to make, now we need to work out whether it is a reasonable assumption to make that the callback gets a new stack each time it is called! Once again, if it is clearly documented it isn’t that much of a problem, but none of the standard library functions document whether they create a new stack or not.
The problem here is that common usage is conflicting. There is a lot of advice and existing libraries that make the assumption that a callback implies a new stack. At the same time there is existing code within the standard library that does not create a new stack each time. To make matters worse, this is not always consistent either, it can often depend on the actual arguments passed to the function as to whether a new stack is created, or the callback is executed on the existing stack!
What then can we make of this mess? Well, once again, as a caller you need to make sure you understand when the state is going to be mutated by the function, and also exactly when, and on which stack your callback will be executed.
As an API provider as always, you need to document this stuff, but lets try to stick to some common ground; callback should always be executed in a new stack, not on the existing one.