enhancement request: make py3 read/write py2 pickle format

Robert Kern robert.kern at gmail.com
Wed Jun 10 09:52:29 EDT 2015


On 2015-06-10 13:08, Marko Rauhamaa wrote:
> Robert Kern <robert.kern at gmail.com>:
>
>> By the very nature of the stated problem: serializing all language
>> objects. Being able to construct any object, including instances of
>> arbitrary classes, means that arbitrary code can be executed. All I
>> have to do is make a pickle file for an object that claims that its
>> constructor is shutil.rmtree().
>
> You can't serialize/migrate arbitrary objects. Consider open TCP
> connections, open files and other objects that extend outside the Python
> VM.

Yes, yes, but that's really beside the point. Yes, there are some objects for 
which it doesn't even make sense to serialize. But my point is that even in this 
slightly smaller set of objects that *can* be serialized (and pickle currently 
does serialize), being able to serialize all of them entails arbitrary code 
execution to deserialize them. To allow people to write their own types that can 
be serialized, you have to let them specify arbitrary callables that will do the 
reconstruction. If you whitelist the possible reconstruction callables, you have 
greatly restricted the types that can participate in the serialization system.

> Also objects hold references to each other, leading to a huge
> reference mesh.
>
> For example:
>
>     a.buddy = b
>     b.buddy = a
>     with open("a", "wb") as f: f.write(serialize(a))
>     with open("b", "wb") as f: f.write(serialize(b))
>
>     with open("a", "rb") as f: aa = deserialize(f.read())
>     with open("b", "rb") as f: bb = deserialize(f.read())
>     assert aa.buddy is bb

Yeah, no one expects that to work. For example, if I deserialize the same string 
twice, you can't expect to get identical returned objects (as in, 
"deserialize(pickle) is deserialize(pickle)"). However, pickle does correctly 
handle fairly arbitrary reference graphs within the context of a single 
serialization, which is the most that can be asked of a serialization system. 
That isn't really a concern here.

 >>> class A(object):
...     pass
...
 >>> a = A()
 >>> b = A()
 >>> a.buddy = b
 >>> b.buddy = a
 >>> data = [a, b]
 >>> data[0].buddy is data[1]
True
 >>> data[1].buddy is data[0]
True
 >>> import cPickle
 >>> unpickled = cPickle.loads(cPickle.dumps(data))
 >>> unpickled[0].buddy is unpickled[1]
True
 >>> unpickled[1].buddy is unpickled[0]
True

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco




More information about the Python-list mailing list