[Python-ideas] Adding a safe alternative to pickle in the standard library

Bruce Leban bruce at leapyear.org
Fri Feb 22 00:39:19 CET 2013


On Thu, Feb 21, 2013 at 3:01 AM, Devin Jeanpierre <jeanpierreda at gmail.com>
wrote:

> I've been noticing a lot of security-related issues being discussed in
> the Python world since the Ruby YAML problemcame out. Is it time to
> consider adding an alternative to pickle that is safe(r) by default?
>
> Pickle is usable in situations few other things are, because it can
> handle cyclic references and virtually any python object. The only
> stdlib alternative I'm aware of is json, which can do neither of those
> things. (Or at least, not without significant extra serialization
> code.) I would imagine that any alternative supplied should be easy
> enough to use that pickle users would seriously consider switching,
> and include at least those features.
>

Pickle is unsafe if you give it untrusted input. It's safe if you pickle
something yourself and then unpickle it. If the problem is that you want to
pickle something and store it in some unsafe place (like a cookie or a db
under user control) and then read it back in later and unpickle it, then
you can mitigate the risk by using an HMAC or some other mechanism to
prevent tampering and may want to consider encrypting it too.

That said, there is one risk in pickling something yourself and unpickling
it later that you need to watch out for. If your objects change, then
unpickling might produce unexpected and even potentially unsafe results.
You can mitigate this by adding object versions to your objects (as long as
you don't forget to update that when the object changes).

There's another problem - pickling is not guaranteed to work across Python
versions. So you may find yourself having to read pickles that are no
longer readable in a future python version. Not a problem for cookies, but
a potential headache with long-lived pickles.

All of this leads me to suggest using a better format for this problem.
Json is a reasonable choice (I've used it myself) although I would still
use an HMAC. If you encrypt it then that makes attacking the object that
much harder. I'd advise against using your own format. I wrote a tutorial
on hacking web sites called Gruyere <http://j.mp/gruyere-security>. I
suggest reading the section on cookies
http://j.mp/learn-state-manipulation (although
to be honest, I recommend reading the whole thing :-)

Aside from security, using a format like json encourages you to think about
what belongs in the persisted object and what doesn't. Suppose your object
includes a url. If you pickle it, you may end up persisting the parsed url
with a dictionary of parameters and other unnecessary overhead. When you
convert to json, you're going to just copy the url.



On Thu, Feb 21, 2013 at 7:50 AM, Dustin J. Mitchell <dustin at v.igoro.us>wrote:

> This conversation worries me.  The security community has shown that
> safety isn't something you can add to a powerful tool.  With great power
> comes great expressivity, and correspondingly more difficulty reasoning
> about it.  Not to mention reasoning about yhe implementation.  JSON is
> probably secure against code-execution exploits, but only probably.
>
> When you put something in the stdlib and call it "safe", even with
> caveats, people will make even more brazen mistakes than with a
> documented-unsafe tool like pickle.
>
Yes indeed.

--- Bruce
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130221/67876f1d/attachment.html>


More information about the Python-ideas mailing list