Seek the one billionth line in a file containing 3 billion lines.

Peter Otten __peter__ at web.de
Wed Aug 8 02:52:20 EDT 2007


Sullivan WxPyQtKinter wrote:

> I have a huge log file which contains 3,453,299,000 lines with
> different lengths. It is not possible to calculate the absolute
> position of the beginning of the one billionth line. Are there
> efficient way to seek to the beginning of that line in python?
> 
> This program:
> for i in range(1000000000):
>       f.readline()
> is absolutely every slow....
> 
> Thank you so much for help.

That will be slow regardless of language. However

n = 10**9 - 1
assert n < sys.maxint
f = open(filename)
wanted_line = itertools.islice(f, n, None).next()

should do slightly better than your implementation.

Peter





More information about the Python-list mailing list