marshal vs pickle

Aaron Watters aaron.watters at gmail.com
Thu Nov 1 16:35:15 EDT 2007


On Nov 1, 2:15 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> On Nov 1, 4:45 am, Aaron Watters <aaron.watt... at gmail.com> wrote:
>
> > Marshal is more secure than pickle
>
> "More" or "less" make little sense in a security context which
> typically is an all or nothing affair.  Neither module is designed for
> security.  From the docs for marshal:
>
> '''
> Warning: The marshal module is not intended to be secure against
> erroneous or maliciously constructed data. Never unmarshal data
> received from an untrusted or unauthenticated source.
> '''
>
> If security is a focus, then use xmlrpc or some other tool that
> doesn't construct arbitrary code objects.

I disagree.  Xmlrpc is insecure if you compile
and execute  one of the strings
you get from it.  Marshal is similarly insecure if you evaluate a code
object it hands you.  If you aren't that dumb, then neither one
is a problem.  As far as I'm concerned marshal.load is not any
more insecure than file.read.

Pickle on the other hand can execute just about anything without
you knowing anything about it.  It is a horrendous mistake
to suggest that anyone should implement RPC using pickle.  If they
want it to be fast they can use marshal, except for that thing
about non-portability which was a design mistake, imho.

By the way: here is a test program which shows pickle running
4 times slower than marshal on my machine using python 2.5.1:

"""
import marshal
import cPickle
import time

def pdump(value, f):
    #cPickle.dump(value, f, 2)
    return cPickle.dumps(value, 2)

def mdump(value, f):
    #marshal.dump(value, f)
    return marshal.dumps(value)

def test(dump, fn):
    now = time.time()
    #f = open(fn, "wb")
    f = None
    for i in range(3):
        D = {}
        for j in range(200000):
            k = (i*133+j*119)%151
            D[ (str(k),str(j)) ] = (str(i), [k, str(k)])
        dump(D.items(), f)
    #f.close()
    elapsed = time.time()-now
    print dump, elapsed

if __name__=="__main__":
    test(mdump, "mdump.dat")
    test(pdump, "ptemp.dat")
"""

  -- Aaron Watters
===
If you think you are smart enough to write multi-threaded programs
you're not.   -- Jim Ahlstrom's corollary to Murphy's Law.

http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=ahlstrom




More information about the Python-list mailing list