[Pandas-dev] pickle is evil

Jeff Reback jeffreback at gmail.com
Mon Apr 22 03:19:48 CEST 2013


I realized I didn't answer your question

this just catches on pickle.load

try:
   pickle.load
except (TypeError):
    pickle_compat.load
except:
    if not PY3:
         raise
    # try to I unpickle with an encoding here

On Apr 21, 2013, at 9:12 PM, Jeff Reback <jeffreback at gmail.com> wrote:

> avro (better choice that msgpack I think)
> will be very straightforward add on 
> 
> the format should prob be done independently of internals anyhow at the price of a bit more code, or could store block managers and be somewhat code simpler
> 
> 
> 
> On Apr 21, 2013, at 9:01 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> 
>> On Sun, Apr 21, 2013 at 3:01 PM, Jeff Reback <jeffreback at gmail.com> wrote:
>>> I thought I'd share a particularly evil pickle issue. In my refactor of
>>> Series to not subclass ndarray, the new pickling tests were breaking. No
>>> suprise
>>> because I changed __getstate__ to pickle via the BlockManager. In order to
>>> ensure compat I thought I could just fix __setstate__ and figure out what to
>>> do
>>> based on the return state (e.g. the len of the state returned as a tuple or
>>> dict or whatever).
>>> 
>>> But no...apparently the reconstruction algorithm takes the class name that
>>> it see and tries to create it w/o using __new__ (or anything else that you
>>> can intercept),
>>> it uses a builtin method called _reconstruct (which is a builtin, but I
>>> can't figure out how to override it at all, must be only c-code).
>>> 
>>> And then numpy gets ahold of it (as its an extension type), and complains
>>> becuase the class I am trying to instantiate actually isn't a sub-class of
>>> ndarray
>>> (which it pre-supposes).
>>> 
>>> So, a bit hacky, but using a custom unpickler, then matching on a
>>> compatbility class (that sub-classes from ndarray), allows me to return the
>>> correct class.
>>> 
>>> The good thing here is that this whole routine isn't even called unless
>>> there is a TypeError on the original unpickle
>>> 
>>> whoosh!
>>> 
>>> --------
>>> # new module: compat/unpickle_compat.py
>>> 
>>> import numpy as np
>>> import pandas
>>> from pandas.core.series import Series
>>> from pandas.sparse.series import SparseSeries
>>> import pickle
>>> 
>>> class Unpickler(pickle.Unpickler):
>>>   pass
>>> 
>>> def load_reduce(self):
>>>   stack = self.stack
>>>   args = stack.pop()
>>>   func = stack[-1]
>>>   if type(args[0]) is type:
>>>       n = args[0].__name__
>>>       if n == 'DeprecatedSeries':
>>>           stack[-1] = object.__new__(Series)
>>>           return
>>>       elif n == 'DeprecatedSparseSeries':
>>>           stack[-1] = object.__new__(SparseSeries)
>>>           return
>>> 
>>>   value = func(*args)
>>>   stack[-1] = value
>>> 
>>> Unpickler.dispatch['R'] = load_reduce
>>> 
>>> def load(file):
>>>   # try to load a compatibility pickle
>>>   # fake the old class hierarchy
>>>   # if it works, then return the new type objects
>>> 
>>>   try:
>>>       pandas.core.series.Series = DeprecatedSeries
>>>       pandas.sparse.series.SparseSeries = DeprecatedSparseSeries
>>>       with open(file,'rb') as fh:
>>>           return Unpickler(fh).load()
>>>   except:
>>>       raise
>>>   finally:
>>>       pandas.core.series.Series = Series
>>>       pandas.sparse.series.SparseSeries = SparseSeries
>>> 
>>> class DeprecatedSeries(Series, np.ndarray):
>>>   pass
>>> 
>>> class DeprecatedSparseSeries(DeprecatedSeries):
>>>   pass
>>> 
>>> 
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> http://mail.python.org/mailman/listinfo/pandas-dev
>> 
>> Yes, pickle is evil. Will this fix affect pickle.loads/pickle.dumps? I
>> would prefer to get a msgpack or Avro-based serialization format for
>> Series or DataFrame sorted out before we start gutting the internals
>> of the objects.
>> 
>> - Wes
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> http://mail.python.org/mailman/listinfo/pandas-dev


More information about the Pandas-dev mailing list