pickle, modules, and ImportErrors

Chris Angelico rosuav at gmail.com
Wed Jan 7 20:06:27 EST 2015


On Thu, Jan 8, 2015 at 11:23 AM, John Ladasky
<john_ladasky at sbcglobal.net> wrote:
>> P.S. don't use pickle, it is a security vulnerability equivalent in
>> severity to using exec in your code, and an unversioned opaque
>> schemaless blob that is very difficult to work with when circumstances
>> change.
>
> For all of its shortcomings, I can't live without pickle.  In this case, I am doing data mining.  My TrainingSession class commandeers seven CPU cores via Multiprocessing.Pool.  Still, even my "toy" TrainingSessions take several minutes to run.  I can't afford to re-run TrainingSession every time I need my models.  I need a persistent object.
>
> Besides, the opportunity for mischief is low.  My code is for my own personal use.  And I trust the third-party libraries that I am using.  My SVRModel object wraps the NuSVR object from scikit-learn, which in turn wraps the libsvm binary.

There are several issues, not all of which are easily dodged. Devin cited two:

* Security: it's fundamentally equivalent to using 'exec'
* Unversioned: it's hard to make updates to your code and then load old data

"For your own personal use" dodges the first one, but makes the second
one even more of a concern. You can get much better persistence using
a textual format like JSON, and adding in a simple 'version' member
can make it even easier. Then, when you make changes, you can cope
with old data fairly readily.

Pickle is still there if you want it, but you do have to be aware of
its limitations. If you edit the TrainingSession class, you may well
have to rerun the training... but maybe that's not a bad thing.

ChrisA



More information about the Python-list mailing list