best(fastest) way to send and get lists from files

Nick Craig-Wood nick at craig-wood.com
Tue Feb 5 10:30:05 EST 2008


Abrahams, Max <Max_Abrahams at brown.edu> wrote:
> 
>  I've looked into pickle, dump, load, save, readlines(), etc.
> 
>  Which is the best method? Fastest? My lists tend to be around a thousand to a million items.
> 
>  Binary and text files are both okay, text would be preferred in
>  general unless there's a significant speed boost from something
>  binary.

You could try the marshal module which is very vast, lightweight and
built in.

  http://www.python.org/doc/current/lib/module-marshal.html

It makes a binary format though, and it will only dump "simple"
objects - see the page above.  It is what python uses internally to
make .pyc files from .py I believe.

------------------------------------------------------------
#!/usr/bin/python

import os
from marshal import dump, load
from timeit import Timer

def write(N, file_name = "z.marshal"):
    L = range(N)
    out = open(file_name, "wb")
    dump(L, out)
    out.close()
    print "Written %d bytes for list size %d" % (os.path.getsize(file_name), N)

def read(N):
    inp = open("z.marshal", "rb")
    L = load(inp)
    inp.close()
    assert len(L) == N

for log_N in range(7):
    N = 10**log_N
    loops = 10
    write(N)
    print "Read back %d items in" % N, Timer("read(%d)" % N, "from __main__ import read").repeat(1, loops)[0]/loops, "s"
------------------------------------------------------------

Produces

$ ./test-marshal.py
Written 10 bytes for list size 1
Read back 1 items in 4.14133071899e-05 s
Written 55 bytes for list size 10
Read back 10 items in 4.31060791016e-05 s
Written 505 bytes for list size 100
Read back 100 items in 8.23020935059e-05 s
Written 5005 bytes for list size 1000
Read back 1000 items in 0.000352478027344 s
Written 50005 bytes for list size 10000
Read back 10000 items in 0.00165479183197 s
Written 500005 bytes for list size 100000
Read back 100000 items in 0.0175776958466 s
Written 5000005 bytes for list size 1000000
Read back 1000000 items in 0.175704598427 s

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list