PEP-0492
was recently approved giving Python 3.5 some special syntax for
dealing with co-routines. A lot of the new functionaltiy was available
pre-3.5, but the syntax certainly wasn’t ideal as the concept
of generators and co-routines were kind of intermingled. PEP-0492
makes an explicit distinction between generators and co-routines
through the use of the the async
keyword.
This post aims to describe how these new mechanisms works from a rather low-level. If you are mostly interested in just using this functionality for high-level stuff I recommend skipping this post and reading up on the built-in asyncio module. If you are interested in how these low-level concepts can be used to build up your own version of the asyncio module, then you might find this interesting.
For this post we’re going to totally ignore any asynchronous I/O aspect and just limit things to interleaving progress from multiple co-routines. Here are two very simple functions:
def coro1(): print("C1: Start") print("C1: Stop") def coro2(): print("C2: Start") print("C2: a") print("C2: b") print("C2: c") print("C2: Stop")
We start with two very simple function, coro1
and
coro2
. We could call these function one after the
other:
coro1() coro2()
and we’d get the expected output:
C1: Start C1: Stop C2: Start C2: a C2: b C2: c C2: Stop
But, for some reason, rather than running these one after the other, we’d like to interleave the execution. We can’t just do that with normal functions, so let’s turn these into co-routines:
async def coro1(): print("C1: Start") print("C1: Stop") async def coro2(): print("C2: Start") print("C2: a") print("C2: b") print("C2: c") print("C2: Stop")
Through the magic of the new async
these functions are
no-longer functions, but now they are co-routines (or more
specifically native co-routine functions). When you call a
normal function, the function body is executed, however when you call
a co-routine function the body isn’t executed; instead you get back a
co-routine object:
c1 = coro1() c2 = coro2() print(c1, c2)
gives:
<coroutine object coro1 at 0x10ea60990> <coroutine object coro2 at 0x10ea60a40>
(The interpretter will also also print some runtime warnings that we’ll ignore for now).
So, what good is having a co-routine object? How do we actually execute the thing? Well,
one way to execute a co-routine is through an await expression (using the new await
keyword). You might think you could do something like:
await c1
but, you’d be disappointed. An await expression is only valid syntax when contained within an native co-routine function. You could do something like:
async def main(): await c1
but of course then you are left with the problem of how to force the execution of main
!
The trick to realise is that co-routines are actually pretty
similar to Python generators, and have the same send
method. We can kick off execution of a co-routine by calling the
send
method.
c1.send(None)
This gets our first co-routine executing to completion, however we also get a nasty
StopIteration
exception:
C1: Start C1: Stop Traceback (most recent call last): File "test3.py", line 16, inc1.send(None) StopIteration
The StopIteration exception is the mechanism used to indicate that
a generator (or co-routine in this case) has completed execution.
Despite being an exception it is actually quite expected! We can wrap
this in an appropriate try/catch
block, to avoid the
error condition. At the same time let’s start the execution of our
second co-routine:
try: c1.send(None) except StopIteration: pass try: c2.send(None) except StopIteration: pass
Now we get complete output, but it is disappointingly similar to our original output. So we have a bunch more code, but no actual interleaving yet! Co-routines are not dissimilar to threads as they allow the interleaving of multiple distinct threads of control, however unlike threads when we have a co-routine any switching is explicit, rather than implicit (which is, in many cases, a good thing!). So we need to put in some of these explicit switches.
Normally the send
method on generators will execute
until the generator yields a value (using the yield
keyword), so you might think we could change coro1
to
something like:
async def coro1(): print("C1: Start") yield print("C1: Stop")
but we can’t use yield
inside a co-routine. Instead we
use the new await
expression, which suspends execution of
the co-routine until the awaitable completes. So we need
something like await _something_
; the question is what is
the something in thise case? We can’t await just await on
nothing! The PEP explains
which things are awaitable. One of them is another native
co-routine, but that doesn’t help get to the bottom of things. One of
them is an object defined with a special CPython API, but we want to
avoid extension modules and stick to pure Python right now. That
leaves two options; either a generator-based coroutine object
or a special Future-like like object.
So, let’s go with the generator-based co-routine object to start
with. Basically a Python generator (e.g.: something that has a
yield
in it) can be marked as a co-routine through the
types.coroutine
decorator. So a very simple example of
this would be:
@types.coroutine def switch(): yield
This define a generator-based co-routine
function. To get a generator-based co-routine
object we just call the function. So, we can change
our coro1
co-routine to:
async def coro1(): print("C1: Start") await switch() print("C1: Stop")
With this in place, we hope that we can interleave our execution of
coro1
with the execution of coro2
. If we try
it with our existing code we get the output:
C1: Start C2: Start C2: a C2: b C2: c C2: Stop
We can see that as expected coro1
stopped executing after the first print statement,
and then coro2
was able to execute. In fact, we can look at the co-routine object
and see exactly where it is suspended with some code like this:
print("c1 suspended at: {}:{}".format(c1.gi_frame.f_code.co_filename, c1.gi_frame.f_lineno))
which print-out the line of your await
expression. (Note: this gives you the outer-most await, so is mostly
just for explanatory purpose here, and not particularly useful in the
general case).
OK, the question now is, how can we resume coro1
so that it executes to completion.
We can just use send again. So we end up with some code like:
try: c1.send(None) except StopIteration: pass try: c2.send(None) except StopIteration: pass try: c1.send(None) except StopIteration: pass
which then gives us our expected output:
C1: Start C2: Start C2: a C2: b C2: c C2: Stop C1: Stop
So, at this point we’re manually pushing the co-routines through to completion by
explicitly calling send
on each individual co-routine object. This isn’t going
to work in general. What we’d really like is a function that kept executing all our co-routines
until they had all completed. In other words, we want to continually execute send
on each co-routine object until that method raises the StopIteration
exception.
So, let’s create a function that takes in a list of co-routines and executes them until
completion. We’ll call this function run
.
def run(coros): coros = list(coros) while coros: # Duplicate list for iteration so we can remove from original list. for coro in list(coros): try: coro.send(None) except StopIteration: coros.remove(coro)
This picks a co-routine from the list of co-routines, executes it, and then if
a StopIteration
exception is raised, the co-routine is removed from the
list.
We can then remove the code manually calling the send method and instead do something like:
c1 = coro1() c2 = coro2() run([c1, c2])
And now we have a very simple run-time for executing co-routines using the new await and async features in Python 3.5. Code related to this post is available on github.