Beginner question : skips every second line in file whenusingreadline()
Pettersen, Bjorn S
BjornPettersen at fairisaac.com
Mon Oct 20 18:49:20 EDT 2003
me:
[..idiomatic..]
>
> for line in file(datafile):
> ..do stuff..
>
paul:
> Does this cause the entire input file to be read into memory
> before the for loop begins execution?
Nope. It reads the file in 'appropriately sized' chunks, so it is more
space-efficient than file(..).read().split('\n') [i.e. reading the
entire file into memory], and more time-efficient than reading only
enough bytes to satisfy one line... In other words, it's all-around
better than you could do yourself unless you spent more time on this
than you should <wink>.
> This is great for reading 5 lines, but I might need to read
> 30 million lines from a mortgage company file. I cannot
> read the entire file into memory.
I've done 20+ Gb files (they take forever [which is mostly not a Python
issue], but don't run out of memory).
-- bjorn
ps: does anyone know if there's a way to adjust the chunk size when you
know what's most appropriate? (e.g. empirically, I know that it is very
close to 150K on this machine accessing local disks...)
More information about the Python-list
mailing list