file IO
Jeff Epler
jepler at unpythonic.net
Mon Aug 2 22:34:47 EDT 2004
Are you using Windows? That would mean the answer is almost certainly
"something to do with carriage returns and binary vs text mode". The
lack of a trailing newline on the last line of your example can also
make for additional trouble (though my tests on unix, with stdio, mmap,
and StringIO didn't ever give me a 4-byte file, windows might give you
the file "a\r\nb" when viewed in binary format, "a\nb" when viewed in
text format)
I doubt that the mmap module's readline knows whether the file was
opened in universal text mode---that's a pure Python invention, while
mmap takes a file descriptor.
On Unix, I don't find that a "while" loop with mmap.readline is any
faster than a "for" loop over a file:
[45426 lines, 409305 bytes]
$ timeit -s "..." "readspeed.read_stdio('/usr/share/dict/words')"
10 loops, best of 3: 34.9 msec per loop
$ timeit -s "..." "readspeed.read_mmap('/usr/share/dict/words')"
10 loops, best of 3: 107 msec per loop
[363416 lines, 3274440 bytes]
$ time python -c "import readspeed; readspeed.read_stdio('biggerfile.txt')"
real 0.372s user 0.331s sys 0.031s
$ time python -c "import readspeed; readspeed.read_mmap('biggerfile.txt')"
real 1.080s user 1.013s sys 0.021s
[2907328 lines, 26195520 bytes]
$ time python -c "import readspeed; readspeed.read_stdio('biggerfile.txt')"
real 2.603s user 2.308s sys 0.157s
$ time python -c "import readspeed; readspeed.read_mmap('biggerfile.txt')"
real 8.514s user 7.893s sys 0.153s
I didn't have any "bigger-than-RAM text files" around to test.
Testing "biggerfile.txt" with mode "rU" gives real 3.110s, so there is
some penalty from using universal newlines.
------------------------------------------------------------------------
# readspeed.py
from mmap import mmap, PROT_READ
import itertools, os
def consume(iterable):
for j in iterable: pass
def read_stdio(filename):
f = open(filename) # open(filename, "rU") is slightly slower
consume(f)
def read_mmap(filename):
f = open(filename)
fd = f.fileno()
m = mmap(fd, os.fstat(fd).st_size, prot=PROT_READ)
while 1:
if not m.readline(): break
------------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20040802/a69090da/attachment.sig>
More information about the Python-list
mailing list