[Python-Dev] On a new version of pickle [PEP 3154]: self-referential frozensets

M Stefan mstefanro at gmail.com
Sat Jun 23 12:19:05 CEST 2012


Hello,

I'm one of this year's Google Summer of Code students working
on improving pickle by creating a new version. My name is Stefan and
my mentor is Alexandre Vassalotti.

If you're interested, you can monitor the progress in the dedicated
blog at [2] and the bitbucket repository at [3].

One of the goals for picklev4 is to add native opcodes for pickling
of sets and frozensets. Currently these 4 opcodes were added:
* EMPTY_SET, EMPTY_FROZENSET: push an empty set/frozenset in the stack
* UPDATE_SET: update the set in the stack with the top stack slice
     stack before: ... pyset mark stackslice
     stack after : ... pyset
     effect: pyset.update(stackslice)   # inplace union
* UNION_FROZENSET: like UPDATE_SET, but create a new frozenset
     stack before: ... pyfrozenset mark stackslice
     stack after : ... pyfrozenset.union(stackslice)

While this design allows pickling of self-referential sets, self-referential
frozensets are still problematic. For instance, trying to pickle `fs':
a=A(); fs=frozenset([a]); a.fs = fs
(when unpickling, the object a has to be initialized before it is added to
  the frozenset)

The only way I can think of to make this work is to postpone
the initialization of all the objects inside the frozenset until after 
UNION_FROZENSET.
I believe this is doable, but there might be memory penalties if the 
approach
is to simply store all the initialization opcodes in memory until 
pickling the frozenset is finished.

Currently, pickle.dumps(fs,4) generates:
EMPTY_FROZENSET
BINPUT 0
MARK
     BINGLOBAL_COMMON '0 A' # same as GLOBAL '__main__ A' in v3
     EMPTY_TUPLE
     NEWOBJ
     EMPTY_DICT
     SHORT_BINUNICODE 'fs'
     BINGET 0     # retrieves the frozenset which is empty at this 
point, and it
                          # will never be filled because it's immutable
     SETITEM
     BUILD           # a.__setstate__({'fs' : frozenset()})
     UNION_FROZENSET
By postponing the initialization of a, it should instead generate:
EMPTY_FROZENSET
BINPUT 0
MARK
     BINGLOBAL_COMMON '0 A' # same as GLOBAL '__main__ A' in v3
     EMPTY_TUPLE
     NEWOBJ # create the object but don't initialize its state yet
     BINPUT 1
     UNION_FROZENSET
BINGET 1
EMPTY_DICT
SHORT_BINUNICODE 'fs'
BINGET 0
SETITEM
BUILD
POP

While self-referential frozensets are uncommon, a far more problematic
situation is with the self-referential objects created with REDUCE. While
pickle uses the idea of creating empty collections and then filling them,
reduce tipically creates already-filled objects. For instance:
cnt = collections.Counter(); cnt[a]=3; a.cnt=cnt; cnt.__reduce__()
(<class 'collections.Counter'>, ({<__main__.A object at 0x0286E8F8>: 3},))
where the A object contains a reference to the counter. Unpickling an
object pickled with this reduce function is not possible, because the reduce
function, which "explains" how to create the object, is asking for the 
object
to exist before being created.
The fix here would be to pass Counter's dictionary in the state argument,
as opposed to the "constructor parameters" one, as follows:
(<class 'collections.Counter'>, (), {<__main__.A object at 0x0286E8F8>: 3})
When unpickling this, an empty Counter will be created first, and then
__setstate__ will be called to fill it, at which point self-references 
are allowed.
I assume this modification has to be done in the implementations of the data
structures rather than in pickle itself. Pickle could try to fix this by 
detecting
when reduce returns a class type as the first tuple arg and move the
dict ctor parameter to the state, but this may not always be intended.
It's also a bit strange that __getstate__ is never used anywhere in 
pickle directly.

I'm looking forward to hearing your suggestions and opinions in this matter.

Regards,
   Stefan

[1] http://www.python.org/dev/peps/pep-3154/
[2] http://pypickle4.wordpress.com/
[3] http://bitbucket.org/mstefanro/pickle4


More information about the Python-Dev mailing list