namedfields create performance

Mon, 01 Dec 2014 20:35:37 +0000

This is a follow-up to yesterday’s post about namedtuples.

Yesterday I mostly focussed on the performance of accessing attributes on a named tuple object, and the namedfields decorator approach that I showed ended up with the same performance as the standard library namedtuple. One operation that I didn’t consider, but is actually reasonably common is the actual creation of a new object.

My implementation relied on a generic __new__ that used the underlying _fields to work out the actual arguments to pass to the tuple.__new__ constructor:

    def __new__(_cls, *args, **kwargs):
        if len(args) > len(_cls._fields):
            raise TypeError("__new__ takes {} positional arguments but {} were given".format(len(_cls._fields) + 1, len(args) + 1))

        missing_args = tuple(fld for fld in _cls._fields[len(args):] if fld not in kwargs)
        if len(missing_args):
            raise TypeError("__new__ missing {} required positional arguments".format(len(missing_args)))
        extra_args = tuple(kwargs.pop(fld) for fld in _cls._fields[len(args):] if fld in kwargs)
        if len(kwargs) > 0:
            raise TypeError("__new__ got an unexpected keyword argument '{}'".format(list(kwargs.keys())[0]))

        return tuple.__new__(_cls, tuple(args + extra_args))

This seems to work (in my limited testing), but the code is pretty nasty (I’m far from confident that it is correct), and it is also slow. About 10x slower than a class created with the namedtuple factory function, which is just:

    def __new__(_cls, bar, baz):
        'Create new instance of Foo2(bar, baz)'
        return _tuple.__new__(_cls, (bar, baz))

As a result of this finding, I’ve changed my constructor approach, and now generate a custom constructor for each new class using eval. It looks something like:

        str_fields = ", ".join(fields)
        new_method = eval("lambda cls, {}: tuple.__new__(cls, ({}))".format(str_fields, str_fields), {}, {})

With this change constructor performance is on-par with the namedtuple approach, and I’m much more confident that the code is actually correct!

I’ve cleaned up the namedfields code a little, and made it available as part of my pyutil repo.

blog comments powered by Disqus