file IO

Chris records2010 at yahoo.com
Tue Aug 3 22:31:17 EDT 2004


Jeff Epler <jepler at unpythonic.net> wrote in message news:<mailman.1080.1091500509.5135.python-list at python.org>...
> Are you using Windows?  That would mean the answer is almost certainly
> "something to do with carriage returns and binary vs text mode".  The
> lack of a trailing newline on the last line of your example can also
> make for additional trouble (though my tests on unix, with stdio, mmap,
> and StringIO didn't ever give me a 4-byte file, windows might give you
> the file "a\r\nb" when viewed in binary format, "a\nb" when viewed in
> text format)
> 
> I doubt that the mmap module's readline knows whether the file was
> opened in universal text mode---that's a pure Python invention, while
> mmap takes a file descriptor.
> 
> On Unix, I don't find that a "while" loop with mmap.readline is any
> faster than a "for" loop over a file:
> 
> [45426 lines, 409305 bytes]
> $ timeit -s "..." "readspeed.read_stdio('/usr/share/dict/words')"
> 10 loops, best of 3: 34.9 msec per loop
> $ timeit -s "..." "readspeed.read_mmap('/usr/share/dict/words')"
> 10 loops, best of 3: 107 msec per loop
> 
> [363416 lines, 3274440 bytes]
> $ time python -c "import readspeed; readspeed.read_stdio('biggerfile.txt')"
> real 0.372s  user 0.331s  sys 0.031s
> $ time python -c "import readspeed; readspeed.read_mmap('biggerfile.txt')"
> real 1.080s  user 1.013s  sys 0.021s
> 
> [2907328 lines, 26195520 bytes]
> $ time python -c "import readspeed; readspeed.read_stdio('biggerfile.txt')"
> real 2.603s  user 2.308s  sys 0.157s
> $ time python -c "import readspeed; readspeed.read_mmap('biggerfile.txt')"
> real 8.514s  user 7.893s  sys 0.153s
> 
> I didn't have any "bigger-than-RAM text files" around to test.
> 
> Testing "biggerfile.txt" with mode "rU" gives real 3.110s, so there is
> some penalty from using universal newlines.
> 
> ------------------------------------------------------------------------
> # readspeed.py
> from mmap import mmap, PROT_READ
> import itertools, os
> 
> def consume(iterable):
>     for j in iterable: pass
> 
> def read_stdio(filename):
>     f = open(filename) # open(filename, "rU") is slightly slower
>     consume(f)
> 
> def read_mmap(filename):
>     f = open(filename)
>     fd = f.fileno()
>     m = mmap(fd, os.fstat(fd).st_size, prot=PROT_READ)
>     while 1:
>         if not m.readline(): break
> ------------------------------------------------------------------------
> 
> --


I've come across this in C, now that I'm forced to work under XP 
(Thank you, Cygwin!)

Open the file 'rb' or 'r+b' and you avoid the entire issue of newlines.



More information about the Python-list mailing list