[Python-ideas] Adding a safe alternative to pickle in the standard library

Steve Dower Steve.Dower at microsoft.com
Fri Feb 22 17:59:32 CET 2013


> From: Stephen J. Turnbull [mailto:stephen at xemacs.org]
> Steve Dower writes:
>  > To be really explicit, I would make load/loads only work with  
> > built-in types. For compatibility when reading earlier protocols we  
> > could add a type representing a class instance and its members that 
> > doesn't actually construct anything. (Maybe we override __getattr__ 
> > to make it lazily construct the instance when the program actually 
> > wants to use it?)
> 
> I am not a security expert, but it seems to me that's going in the wrong
> direction.  Unpickler would *still* run constructor code automatically under
> some circumstances -- but those circumstances become murkier.

Agreed on the bit in parentheses, that's probably questionable enough to ignore. However, if it only works with built-in types then there is no user code that will run. IIUC we already have a C implementation of pickle that is immune to users redefining builtins (if not, we should do this too). Pickled objects would be unpickled as (effectively) a tuple of the members - ('system', ("echo Hello World")) does not execute any code. (And yeah, wrap that tuple up in a type that can be tested.)

> > For convenience, I'd add a parameter to Unpickler to let the user 
> > provide a set of types that are allowed to be constructed (or a  
> > mapping from names to callables as used in find_class()).
> 
> And this is secure, why?  There's no way to decorate the allowed types to
> add nasty stuff to the pickled class definitions (including built-in types), right?

Code is only pickled by name. Unpickler resolves the names and returns the class or function reference in the current environment. If it can't find the module or name in its current environment, it raises an error.

> There are no bugs that allow a back door, right? 

Of course not. That's why we never see security patches or updates for operating systems or platforms. This is a silly argument.

> Is the API sufficiently well-designed that users will easily figure out how to
> do what they need, and *only* what they need, and therefore won't be
> tempted to simply turn on permission to do *everything*? 

All we can ever do is provide instructions to keep the developer safe and make it clear that ignoring those rules will reduce the security of their program. It's up to the developer to make the right decisions.

> And they won't give up, and write their own?

In my experience, people more often write their own out of ignorance rather than frustration (same for unnecessarily using XML). Or they'll switch to an earlier version of Python that doesn't have this change in it. Again, we can encourage, but not dictate.

> Isn't it better just to give users the advice to use JSON where it will do?
> Perhaps the difference in APIs will give them pause to think again if they're
> starting to think about unpickling classes?

Maybe, though since pickle is literally the Python equivalent (should it have been called Python Object Notation (PON)? Probably not...) we should be ensuring that it is the best it can be.

> Granted, I don't have answers to those questions (except for myself!) But I
> think some thought should be given to them before trying to create a
> restricted pickle protocol and make it default.  Restricted
> modes/protocols/sublanguages are hard to get right.

Agreed. I don't think we need a new protocol though, just a less permissive default implementation of Unpickler.find_class().



More information about the Python-ideas mailing list