[Python-ideas] new pickle semantics/API

Fri Jan 26 00:29:02 CET 2007

"tomer filiba" <tomerfiliba at gmail.com> wrote:
> On 1/25/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > Overall, I like the idea; I'm a big fan of simplifying object
> > persistence and/or serialization.  A part of me also likes how the
> > objects can choose to lie about their types.
> >
> > But another part of me says; the basic objects that you specified
> > already have a format that is unambiguous, repr(obj).  They also are
> > able to be reconstructed from their component parts via eval(repr(obj)),
> > or even via the 'unrepr' function in the ConfigObj module.  It doesn't
> > handle circular referencse.
> 
> well, repr is fine for most simple things, but you don't use repr to
> serialize objects, right? it's not powerful/introspective enough.
> besides repr is meant to be readable, while __getstate__ can return
> any object. imagine this:

I use repr to serialize objects all the time.  ConfigObj is great when I
want to handle python-based configuration information, and/or I don't
want to worry about the security implications of 'eval(arbitrary string)',
or 'import module'.

With a proper __repr__ method, I can even write towards your API:

class mylist(object):
    def __repr__(self):
        state = ...
        return 'mylist.__setstate__(%r)'%(state,)

> class complex:
>     def __repr__(self):
>         return "(%f+%fj)" % (self.real, self.imag)

I would use 'return "(%r+%rj)"% (self.real, self.imag)', but it doesn't
much matter.

> repr is made for humans of course, while serialization is
> made for machines. they serves different purposes,
> so they need different APIs.

I happen to disagree.  The only reason to use a different representation
or API is if there are size and/or performance benefits to offering a
machine readable vs. human readable format.

I'm know that there are real performance advantages to using (c)Pickle
over repr/unrepr, but I use it also so that I can change settings with
notepad (as has been necessary on occasion).

> > Even better, it has 3 native representations; repr(a).encode('zlib'),
> > repr(a), pprint.pprint(a); each offering a different amount of user
> > readability.  I digress.
> 
> you may have digressed, but that's a good point -- that's exactly
> why i do NOT specify how objects are encoded as a stream of bytes.
> 
> all i'm after is the state of the object (which is expressed in terms of
> other, more primitive objects).

Right, but as 'primative objects' go, you cant get significantly more
primitive than producing a string that can be naively understood by
someone familliar with Python *and* the built-in Python parser. 
Nevermind that it works *today* with all of the types you specified
earlier (with the exception of file objects - which you discover on
parsing/reproducing the object).

> you can think of repr as a textual serializer to some extent, that
> can use the proposed __getstate__ API. pprint is yet another
> form of serializer.

Well, pprint is more or less a pretty repr.

> > I believe the biggest problem with the proposal, as specified, is that
> > changing the semantics of __getstate__ and __setstate__ is a bad idea.
> > Add a new pair of methods and ask the twisted people what they think.
> > My only criticism will then be the strawman repr/unrepr.
> 
> i'll try to come up with new names... but i don't have any ideas
> at the moment.

Like Colin, I also like __rebuild__.

 - Josiah