pickling a subclass of tuple

Sat Jan 1 11:28:55 EST 2005

fedor <nobody at here.com> wrote:

> Hi all, happy new year,
> 
> I was trying to pickle a instance of a subclass of a tuple when I ran
> into a problem. Pickling doesn't work with HIGHEST_PROTOCOL. How should
> I rewrite my class so I can pickle it?

You're falling afoul of an optimization in pickle's protocol 2, which is
documented in pickle.py as follows:

# A __reduce__ implementation can direct protocol 2 to
# use the more efficient NEWOBJ opcode, while still
# allowing protocol 0 and 1 to work normally.  For this to
# work, the function returned by __reduce__ should be
# called __newobj__, and its first argument should be a
# new-style class.  The implementation for __newobj__
# should be as follows, although pickle has no way to
# verify this:
#
# def __newobj__(cls, *args):
#     return cls.__new__(cls, *args)
#
# Protocols 0 and 1 will pickle a reference to __newobj__,
# while protocol 2 (and above) will pickle a reference to
# cls, the remaining args tuple, and the NEWOBJ code,
# which calls cls.__new__(cls, *args) at unpickling time
# (see load_newobj below).  If __reduce__ returns a
# three-tuple, the state from the third tuple item will be
# pickled regardless of the protocol, calling __setstate__
# at unpickling time (see load_build below).

Essentially, and simplifying just a little...: you're inheriting
__reduce_ex__ (because you're not overriding it), but you ARE overriding
__new__ *and changing its signature* -- so, the inherited __reduce__ex__
is used, and, with this protocol 2 optimization, it essentially assumes
that __new__ is similarly used -- or, at least, that a __new__ is used
which does not arbitrarily change the signature!

So, if you want to change __new__'s signature, and yet be picklable by
protocol 2, you have to override __reduce_ex__ to return the right
"args"... those your class's __new__ expects!

For example, you could consider something like...:

def __newobj__(cls, *args):
    return cls.__new__(cls, *args)

class A(tuple):
    def __new__(klass, arg1, arg2):
        return super(A, klass).__new__(klass, (arg1, arg2))

    def __reduce_ex__(self, proto=0):
        if proto >= 2:
            return __newobj__, (A, self[0], self[1])
        else:
            return super(A, self).__reduce_ex__(proto)

Note the key difference in A's __reduce_ex__ (for proto=2) wrt tuple's
(which is the same as object's) -- that's after an "import a" where a.py
has this code as well as an 'a = A(1, 2)'...:

>>> a.a.__reduce_ex__(2)
(<function __newobj__ at 0x3827f0>, (<class 'a.A'>, 1, 2))
>>> tuple.__reduce_ex__(a.a, 2)
(<function __newobj__ at 0x376770>, (<class 'a.A'>, (1, 2)), {}, None,
None)
>>> 

Apart from the additional tuple items (not relevant here), tuple's
reduce returns args as (<class 'a.A'>, (1, 2)) -- two items: the class
and the tuplevalue; so with protocol 2 this ends up calling A.__new__(A,
(1,2))... BOOM, because, differently from tuple.__new__, YOUR override
doesn't accept this signature!  So, I suggest tweaking A's reduce so it
returns args as (<class 'a.A'>, 1, 2)... apparently the only signature
you're willing to accept in your A.__new__ method.

Of course, if A.__new__ can have some flexibility, you COULD have it
accept the same signature as tuple.__new__ and then you wouldn't have to
override __reduce_ex__.  Or, you could override __reduce_ex__ in other
ways, say:

    def __reduce_ex__(self, proto=0):
        if proto >= 2:
            proto = 1
        return super(A, self).__reduce_ex__(proto)

this would avoid the specific optimization that's tripping you up due to
your signature-change in __new__.

The best solution may be to forget __reduce_ex__ and take advantage of
the underdocumented special method __getnewargs__ ...:

class A(tuple):
    def __new__(klass, arg1, arg2):
        return super(A, klass).__new__(klass, (arg1, arg2))

    def __getnewargs__(self):
        return self[0], self[1]

This way, you're essentially choosing to explicitly tell the "normal"
__reduce_ex__ about the particular arguments you want to be used for the
__new__ call needed to reconstruct your object on unpickling!  This
highlights even better the crucial difference, due strictly to the
change in __new__'s signature...:

>>> a.a.__getnewargs__()
(1, 2)
>>> tuple.__getnewargs__(a.a)
((1, 2),)

It IS, I guess, somewhat unfortunate that you have to understand
pickling in some depth to let you change __new__'s signature and yet
fully support pickling... on the other hand, when you're overriding
__new__ you ARE messing with some rather deep infrastructure,
particularly if you alter its signature so that it doesn't accept
"normal" calls any more, so it's not _absurd_ that compensatory depth of
understanding is required;-).

Alex