[Python-Dev] On a new version of pickle [PEP 3154]: self-referential frozensets

Alexandre Vassalotti alexandre at peadrop.com
Wed Jun 27 23:12:48 CEST 2012


On Sat, Jun 23, 2012 at 3:19 AM, M Stefan <mstefanro at gmail.com> wrote:

> * UNION_FROZENSET: like UPDATE_SET, but create a new frozenset
>    stack before: ... pyfrozenset mark stackslice
>    stack after : ... pyfrozenset.union(stackslice)
>

Since frozenset are immutable, could you explain how adding the
UNION_FROZENSET opcode helps in pickling self-referential frozensets? Or
are you only adding this one to follow the current style used for pickling
dicts and lists in protocols 1 and onward?


> While this design allows pickling of self-referenti/Eal sets,
> self-referential
> frozensets are still problematic. For instance, trying to pickle `fs':
> a=A(); fs=frozenset([a]); a.fs = fs
> (when unpickling, the object a has to be initialized before it is added to
>  the frozenset)
>
> The only way I can think of to make this work is to postpone
> the initialization of all the objects inside the frozenset until after
> UNION_FROZENSET.
> I believe this is doable, but there might be memory penalties if the
> approach
> is to simply store all the initialization opcodes in memory until pickling
> the frozenset is finished.
>
>
I don't think that's the only way. You could also emit POP opcode to
discard the frozenset from stack and then emit a GET to fetch it back from
the memo. This is how we currently handle self-referential tuples. Check
out the save_tuple method in pickle.py to see how it is done. Personally, I
would prefer that approach because it already well-tested and proven to
work.

That said, your approach sounds good too. The memory trade-off could lead
to smaller pickles and more efficient decoding (though these
self-referential objects are rare enough that I don't think that any
improvements there would matter much).

While self-referential frozensets are uncommon, a far more problematic
> situation is with the self-referential objects created with REDUCE. While
> pickle uses the idea of creating empty collections and then filling them,
> reduce tipically creates already-filled objects. For instance:
> cnt = collections.Counter(); cnt[a]=3; a.cnt=cnt; cnt.__reduce__()
> (<class 'collections.Counter'>, ({<__main__.A object at 0x0286E8F8>: 3},))
> where the A object contains a reference to the counter. Unpickling an
> object pickled with this reduce function is not possible, because the
> reduce
> function, which "explains" how to create the object, is asking for the
> object
> to exist before being created.
>

Your example seems to work on Python 3. I am not sure if I follow what you
are trying to say. Can you provide a working example?

$ python3
Python 3.1.2 (r312:79147, Dec  9 2011, 20:47:34)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle, collections
>>> c = collections.Counter()
>>> class A: pass
...
>>> a = A()
>>> c[a] = 3
>>> a.cnt = c
>>> b =pickle.loads(pickle.dumps(a))
>>> b in b.cnt
True


> Pickle could try to fix this by detecting when reduce returns a class type
> as the first tuple arg and move the dict ctor parameter to the state, but
> this may not always be intended. It's also a bit strange that __getstate__
> is never used anywhere in pickle directly.
>

I would advise against any such change. The reduce protocol is already
fairly complex. Further I don't think change it this way would give us any
extra flexibility.

The documentation has a good explanation of how __getstate__ works under
hood:
http://docs.python.org/py3k/library/pickle.html#pickling-class-instances

And if you need more, PEP 307 (http://www.python.org/dev/peps/pep-0307/)
provides some of the design rationales of the API.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120627/8210b61b/attachment.html>


More information about the Python-Dev mailing list