Populating huge data structures from disk

Tue Nov 6 16:44:04 EST 2007

"Michael Bacarella" <mbac at gpshopper.com> writes:

> cPickle with protocol 2 has some promise but is more complicated because
> arrays can't be pickled.

This is not true:

>>> import array
>>> a = array.array('L')
>>> a.extend(xrange(10))
>>> a
array('L', [0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L])
>>> import cPickle as pickle
>>> s = pickle.dumps(a, -1)
>>> pickle.loads(s)
array('L', [0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L])

But I don't think unpickling will be any faster than array.fromstring.

Anyway, why not use array.fromfile instead?  It's actually *faster*
than the C code you posted:

$ time ./eat80    # not eat800 because I didn't feel like waiting
./eat80  0.58s user 2.43s system 93% cpu 3.226 total

$ cat eat80.py
#!/usr/bin/python
import array
a = array.array('L')
f = open('/dev/zero')
a.fromfile(f, 10000000)
print len(a)

$ time ./eat80.py
10000000
./eat80.py  0.02s user 0.00s system 48% cpu 0.058 total