Seek the one billionth line in a file containing 3 billion lines.

Paul Rubin http
Wed Aug 8 02:35:20 EDT 2007


Sullivan WxPyQtKinter <sullivanz.pku at gmail.com> writes:
> This program:
> for i in range(1000000000):
>       f.readline()
> is absolutely every slow....

There are two problems: 

 1) range(1000000000) builds a list of a billion elements in memory,
    which is many gigabytes and probably thrashing your machine.
    You want to use xrange instead of range, which builds an iterator
    (i.e. something that uses just a small amount of memory, and
    generates the values on the fly instead of precomputing a list).

 2) f.readline() reads an entire line of input which (depending on
    the nature of the log file) could also be of very large size.
    If you're sure the log file contents are sensible (lines up to
    several megabytes shouldn't cause a problem) then you can do it
    that way, but otherwise you want to read fixed size units.



More information about the Python-list mailing list