pickle alternative

simonwittber at gmail.com simonwittber at gmail.com
Mon Jul 4 22:45:07 EDT 2005


Ok, I've attached the proto PEP below.

Comments on the proto PEP and the implementation are appreciated.

Sw.



Title: Secure, standard serialization of simple python types.

Abstract

    This PEP suggests the addition of a module to the standard library,
    which provides a serialization class for simple Python types.


Copyright

    This document is placed in the public domain.


Motivation

    The standard library currently provides two modules which are used
    for object serialization. Pickle is not secure by its very nature,
    and the marshal module is clearly marked as being not secure in the
    documentation. The marshal module does not guarantee compatibility
    between Python versions. The proposed module will only serialize
    simple built-in Python types, and provide compatibility across
    Python versions.

    See RFE 467384 (on SourceForge) for more discussion on the above
    issues.


Specification

    The proposed module should use the same API as the marshal module.

        dump(value, file)
        #serialize value, and write to open file object
        load(file)
        #read data from file object, unserialize and return an object
        dumps(value)
        #return the string that would be written to the file by dump
        loads(value)
        #unserialize and return object


Reference Implementation

    http://metaplay.dyndns.org:82/~simon/gherkin.py.txt


Rationale

    The marshal documentation explicitly states that it is unsuitable
    for unmarshalling untrusted data. It also explicitly states that
    the format is not compatible across Python versions.

    Pickle is compatible across versions, but also unsafe for loading
    untrusted data. Exploits demonstrating pickle vulnerability exist.

    xmlrpclib provides serialization functions, but is unsuitable when
    serializing large data structures, or when high performance is a
    requirement. If performance is an issue, a C-based accelerator
    module can be installed. If size is an issue, gzip can be used,
    however, this creates a mutually exclusive size/performance
    trade-off.

    Other existing formats, such as JSON and Bencode (bittorrent) do
    not handle some marginally complex python structures and/or all
    the simple Python types.

    Time and space efficiency, and security do not have to be mutually
    exclusive features of a serializer. Python does not provide, in the
    standard library, a serializer which can work safely with untrusted
    data which is time and space efficient. The proposed gherkin module
    goes some way to achieving this. The format is simple enough to
    easily write interoperable implementations across platforms.




More information about the Python-list mailing list