Seek the one billionth line in a file containing 3 billion lines.

Sullivan WxPyQtKinter sullivanz.pku at gmail.com
Wed Aug 8 02:41:37 EDT 2007


On Aug 8, 2:35 am, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Sullivan WxPyQtKinter <sullivanz.... at gmail.com> writes:
> > This program:
> > for i in range(1000000000):
> >       f.readline()
> > is absolutely every slow....
>
> There are two problems:
>
>  1) range(1000000000) builds a list of a billion elements in memory,
>     which is many gigabytes and probably thrashing your machine.
>     You want to use xrange instead of range, which builds an iterator
>     (i.e. something that uses just a small amount of memory, and
>     generates the values on the fly instead of precomputing a list).
>
>  2) f.readline() reads an entire line of input which (depending on
>     the nature of the log file) could also be of very large size.
>     If you're sure the log file contents are sensible (lines up to
>     several megabytes shouldn't cause a problem) then you can do it
>     that way, but otherwise you want to read fixed size units.


Thank you for pointing out these two problem. I wrote this program
just to say that how inefficient it is to use a seemingly NATIVE way
to seek a such a big file. No other intention........




More information about the Python-list mailing list