[Pandas-dev] pickle is evil
Jeff Reback
jeffreback at gmail.com
Mon Apr 22 03:19:48 CEST 2013
I realized I didn't answer your question
this just catches on pickle.load
try:
pickle.load
except (TypeError):
pickle_compat.load
except:
if not PY3:
raise
# try to I unpickle with an encoding here
On Apr 21, 2013, at 9:12 PM, Jeff Reback <jeffreback at gmail.com> wrote:
> avro (better choice that msgpack I think)
> will be very straightforward add on
>
> the format should prob be done independently of internals anyhow at the price of a bit more code, or could store block managers and be somewhat code simpler
>
>
>
> On Apr 21, 2013, at 9:01 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>
>> On Sun, Apr 21, 2013 at 3:01 PM, Jeff Reback <jeffreback at gmail.com> wrote:
>>> I thought I'd share a particularly evil pickle issue. In my refactor of
>>> Series to not subclass ndarray, the new pickling tests were breaking. No
>>> suprise
>>> because I changed __getstate__ to pickle via the BlockManager. In order to
>>> ensure compat I thought I could just fix __setstate__ and figure out what to
>>> do
>>> based on the return state (e.g. the len of the state returned as a tuple or
>>> dict or whatever).
>>>
>>> But no...apparently the reconstruction algorithm takes the class name that
>>> it see and tries to create it w/o using __new__ (or anything else that you
>>> can intercept),
>>> it uses a builtin method called _reconstruct (which is a builtin, but I
>>> can't figure out how to override it at all, must be only c-code).
>>>
>>> And then numpy gets ahold of it (as its an extension type), and complains
>>> becuase the class I am trying to instantiate actually isn't a sub-class of
>>> ndarray
>>> (which it pre-supposes).
>>>
>>> So, a bit hacky, but using a custom unpickler, then matching on a
>>> compatbility class (that sub-classes from ndarray), allows me to return the
>>> correct class.
>>>
>>> The good thing here is that this whole routine isn't even called unless
>>> there is a TypeError on the original unpickle
>>>
>>> whoosh!
>>>
>>> --------
>>> # new module: compat/unpickle_compat.py
>>>
>>> import numpy as np
>>> import pandas
>>> from pandas.core.series import Series
>>> from pandas.sparse.series import SparseSeries
>>> import pickle
>>>
>>> class Unpickler(pickle.Unpickler):
>>> pass
>>>
>>> def load_reduce(self):
>>> stack = self.stack
>>> args = stack.pop()
>>> func = stack[-1]
>>> if type(args[0]) is type:
>>> n = args[0].__name__
>>> if n == 'DeprecatedSeries':
>>> stack[-1] = object.__new__(Series)
>>> return
>>> elif n == 'DeprecatedSparseSeries':
>>> stack[-1] = object.__new__(SparseSeries)
>>> return
>>>
>>> value = func(*args)
>>> stack[-1] = value
>>>
>>> Unpickler.dispatch['R'] = load_reduce
>>>
>>> def load(file):
>>> # try to load a compatibility pickle
>>> # fake the old class hierarchy
>>> # if it works, then return the new type objects
>>>
>>> try:
>>> pandas.core.series.Series = DeprecatedSeries
>>> pandas.sparse.series.SparseSeries = DeprecatedSparseSeries
>>> with open(file,'rb') as fh:
>>> return Unpickler(fh).load()
>>> except:
>>> raise
>>> finally:
>>> pandas.core.series.Series = Series
>>> pandas.sparse.series.SparseSeries = SparseSeries
>>>
>>> class DeprecatedSeries(Series, np.ndarray):
>>> pass
>>>
>>> class DeprecatedSparseSeries(DeprecatedSeries):
>>> pass
>>>
>>>
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> http://mail.python.org/mailman/listinfo/pandas-dev
>>
>> Yes, pickle is evil. Will this fix affect pickle.loads/pickle.dumps? I
>> would prefer to get a msgpack or Avro-based serialization format for
>> Series or DataFrame sorted out before we start gutting the internals
>> of the objects.
>>
>> - Wes
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> http://mail.python.org/mailman/listinfo/pandas-dev
More information about the Pandas-dev
mailing list