file IO
Chris
records2010 at yahoo.com
Tue Aug 3 22:31:17 EDT 2004
Jeff Epler <jepler at unpythonic.net> wrote in message news:<mailman.1080.1091500509.5135.python-list at python.org>...
> Are you using Windows? That would mean the answer is almost certainly
> "something to do with carriage returns and binary vs text mode". The
> lack of a trailing newline on the last line of your example can also
> make for additional trouble (though my tests on unix, with stdio, mmap,
> and StringIO didn't ever give me a 4-byte file, windows might give you
> the file "a\r\nb" when viewed in binary format, "a\nb" when viewed in
> text format)
>
> I doubt that the mmap module's readline knows whether the file was
> opened in universal text mode---that's a pure Python invention, while
> mmap takes a file descriptor.
>
> On Unix, I don't find that a "while" loop with mmap.readline is any
> faster than a "for" loop over a file:
>
> [45426 lines, 409305 bytes]
> $ timeit -s "..." "readspeed.read_stdio('/usr/share/dict/words')"
> 10 loops, best of 3: 34.9 msec per loop
> $ timeit -s "..." "readspeed.read_mmap('/usr/share/dict/words')"
> 10 loops, best of 3: 107 msec per loop
>
> [363416 lines, 3274440 bytes]
> $ time python -c "import readspeed; readspeed.read_stdio('biggerfile.txt')"
> real 0.372s user 0.331s sys 0.031s
> $ time python -c "import readspeed; readspeed.read_mmap('biggerfile.txt')"
> real 1.080s user 1.013s sys 0.021s
>
> [2907328 lines, 26195520 bytes]
> $ time python -c "import readspeed; readspeed.read_stdio('biggerfile.txt')"
> real 2.603s user 2.308s sys 0.157s
> $ time python -c "import readspeed; readspeed.read_mmap('biggerfile.txt')"
> real 8.514s user 7.893s sys 0.153s
>
> I didn't have any "bigger-than-RAM text files" around to test.
>
> Testing "biggerfile.txt" with mode "rU" gives real 3.110s, so there is
> some penalty from using universal newlines.
>
> ------------------------------------------------------------------------
> # readspeed.py
> from mmap import mmap, PROT_READ
> import itertools, os
>
> def consume(iterable):
> for j in iterable: pass
>
> def read_stdio(filename):
> f = open(filename) # open(filename, "rU") is slightly slower
> consume(f)
>
> def read_mmap(filename):
> f = open(filename)
> fd = f.fileno()
> m = mmap(fd, os.fstat(fd).st_size, prot=PROT_READ)
> while 1:
> if not m.readline(): break
> ------------------------------------------------------------------------
>
> --
I've come across this in C, now that I'm forced to work under XP
(Thank you, Cygwin!)
Open the file 'rb' or 'r+b' and you avoid the entire issue of newlines.
More information about the Python-list
mailing list