pickle alternative

Andrew Dalke dalke at dalkescientific.com
Tue May 31 03:11:12 EDT 2005


simonwittber wrote:
> I've written a simple module which serializes these python types:
> 
> IntType, TupleType, StringType, FloatType, LongType, ListType, DictType

For simple data types consider "marshal" as an alternative to "pickle".

> It appears to work faster than pickle, however, the decode process is
> much slower (5x) than the encode process. Has anyone got any tips on
> ways I might speed this up?


   def dec_int_type(data):
       value = int(unpack('!i', data.read(4))[0])
       return value

That 'int' isn't needed -- unpack returns an int not a string
representation of the int.

BTW, your code won't work on 64 bit machines.

def enc_long_type(obj):
    return "%s%s%s" % ("B", pack("!L", len(str(obj))), str(obj))

There's no need to compute str(long) twice -- for large longs
it takes a lot of work to convert to base 10.  For that matter,
it's faster to convert to hex, and the hex form is more compact.

Every decode you do requires several function calls.  While
less elegant, you'll likely get better performance (test it!)
if you minimize that; try something like this

def decode(data):
    return _decode(StringIO(data).read)    

def _decode(read, unpack = struct.unpack):
    code = read(1)
    if not code:
      raise IOError("reached the end of the file")
    if code == "I":
       return unpack("!i", read(4))[0]
    if code == "F":
       return unpack("!f", read(4))[0]
    if code == "L":
       count = unpack("!i", read(4))
       return [_decode(read) for i in range(count)]
    if code == "D":
       count = unpack("!i", read(4))
       return dict([_decode(read) for i in range(count)]
    ...



				Andrew
				dalke at dalkescientific.com




More information about the Python-list mailing list