Beginner question : skips every second line in file whenusingreadline()

Pettersen, Bjorn S BjornPettersen at fairisaac.com
Mon Oct 20 18:49:20 EDT 2003


me:
[..idiomatic..]
> 
>   for line in file(datafile):
>       ..do stuff..
> 

paul:
> Does this cause the entire input file to be read into memory 
> before the for loop begins execution?

Nope. It reads the file in 'appropriately sized' chunks, so it is more
space-efficient than file(..).read().split('\n') [i.e. reading the
entire file into memory], and more time-efficient than reading only
enough bytes to satisfy one line... In other words, it's all-around
better than you could do yourself unless you spent more time on this
than you should <wink>.

> This is great for reading 5 lines, but I might need to read 
> 30 million lines from a mortgage company file.  I cannot 
> read the entire file into memory.

I've done 20+ Gb files (they take forever [which is mostly not  a Python
issue], but don't run out of memory).

-- bjorn

ps: does anyone know if there's a way to adjust the chunk size when you
know what's most appropriate? (e.g. empirically, I know that it is very
close to 150K on this machine accessing local disks...)





More information about the Python-list mailing list