iterating over large files

Mon Apr 9 11:40:57 EDT 2001

"Paul Brian" <pbrian at demon.net> wrote in message
news:986828874.4669.0.nnrp-13.c1c3e154 at news.demon.co.uk...
> Dear all,
>
> I am trying to iterate line by line over a large text file, using
> readlines().

In Python 2.1, you can use the .xreadlines() method to get the
same effect as .readlines() (for purposes of using it in a for
loop, typically) while using a bounded amount of memory (inside,
it wraps the usage of .readlines with size-hint).

> y = 1
> while y <  4000:
>     for line in myfile.readlines(1024):
>         print y , line
>     y = y + 1
>
> myfile.close()
>
> However it is rather ugly and assumes that I know how much data
readlines()
> will actually take (it depends on an internal buffer according to manaul.
> Not too hot on them), and how big the file is.

What about:

y = 1
while 1:
    lines = myfiles.readlines(1024)
    if not lines: break
    for line in lines:
        print y, line,
    y += 1

Not sure why you want to read only 1024 bytes at a time (surely you
can afford a somewhat-larger buffer...?), but, anyway, the idea is
that readlines returns an empty list when the file is at EOF, so we
can test for it in the 'if not lines' guard.

[Note that each line ends with a \n, so you would 'double-space' the
output if you passed it to print just like that -- which is why I
added the trailing comma in the print statement, assuming the double
space effect is not in fact desired].

Alex