[Python-Dev] PEP 574 -- Pickle protocol 5 with out-of-band data

Antoine Pitrou solipsis at pitrou.net
Thu Mar 29 04:08:02 EDT 2018


On Thu, 29 Mar 2018 01:40:17 +0000
Robert Collins <robertc at robertcollins.net> wrote:
> >
> > Data sharing
> > ------------
> >
> > If you pickle and then unpickle an object in the same process, passing
> > out-of-band buffer views, then the unpickled object may be backed by the
> > same buffer as the original pickled object.
> >
> > For example, it might be reasonable to implement reduction of a Numpy array
> > as follows (crucial metadata such as shapes is omitted for simplicity)::
> >
> >    class ndarray:
> >
> >       def __reduce_ex__(self, protocol):
> >          if protocol == 5:
> >             return numpy.frombuffer, (PickleBuffer(self), self.dtype)
> >          # Legacy code for earlier protocols omitted
> >
> > Then simply passing the PickleBuffer around from ``dumps`` to ``loads``
> > will produce a new Numpy array sharing the same underlying memory as the
> > original Numpy object (and, incidentally, keeping it alive)::  
> 
> This seems incompatible with v4 semantics. There, a loads plus dumps
> combination is approximately a deep copy. This isn't. Sometimes. Sometimes
> it is.

True.  But it's only incompatible if you pass the new
``buffer_callback`` and ``buffers`` arguments.  If you don't, then you
always get a copy.  This is something that consumers should keep in
mind.

Note there's a movement towards immutable data. For example, Dask
arrays and Arrow arrays are designed as immutable.

Regards

Antoine.


More information about the Python-Dev mailing list