marshal vs pickle

Paul Rubin http
Thu Nov 1 23:42:06 EDT 2007


Aaron Watters <aaron.watters at gmail.com> writes:
> >   >>> marshal.loads('RKp,U\xf7`\xef\xe77\xc1\xea\xd8\xec\xbe\\')
> >   Segmentation fault
> >...
> I'll grant you the above as a denial of service attack. ...
> Can you give me an example
> where someone can erase the filesystem using marshal.load?  

You should always assume that if an attacker can induce a memory fault
(typically through a buffer overflow) then s/he can inject and run
arbitrary machine code and take over the process.  It's not even worth
looking for a specific exploit--this type of thing MUST be fixed if
the function can be exposed to untrusted data.  Yes it should be
possible to fix the segfault in marshal--but in principle pickle could
be locked down as well, at least from these code injection attacks.
It's just something the python stdlib doesn't currently address, for
whatever reason.

BTW, if denial of service counts, I think that you also have to check for
algorithmic complexity attacks against Python dictionary objects.
I.e. by constructing a serialized dictionary whose keys all hash to
the same number, you can possibly make the deserializer use quadratic
runtime, bringing the remote process to its knees with a dictionary of
a few million elements, a not-unreasonable size for applications like
database dumps.  (I haven't checked yet what actually happens in
practice if you try this, given that the already-known problems with
pickle and marshal are even worse).  This can't really be fixed in the
serialization format.  Either the deserializer should run in a
controlled environment (enforced resource bounds) or (preferably) the
underlying dict implementation should change to resist this attack.

For more info, see: http://www.cs.rice.edu/~scrosby/hash/



More information about the Python-list mailing list