[Baypiggies] reading very large files

Simeon Franklin simeonf at gmail.com
Tue May 17 19:56:28 CEST 2011


I missed the list too. Curse that reply button :)
On Tue, May 17, 2011 at 10:17 AM, Vikram K <kpguy1975 at gmail.com> wrote:
> I wish to read a large data file (file size is around 1.8 MB) and manipulate
> the data in this file. Just reading and writing the first 500 lines of this
> file is causing a problem. I wrote:
>
> fin = open('gene-GS00471-DNA_B01_1101_37-ASM.tsv')
> count = 0
> for i in fin.readlines():
>     print i
>     count += 1
>     if count >= 500:
>         break

You don't need the readlines call - the file object itself supports
iteration over lines; readlines() is there is you specifically want to
create a list containing all the lines in the file. Try it with

for i in fin:

instead of

for i in fin.readlines():

and see... Were you mistaken above and is the filesize 1.8 GB instead
of MB? You shouldn't be having memory errors with 1.8MB given a normal
environment. If you are working with multi-gigabyte files, however,
you should read David Beazley's awesome Generator Tricks paper
(http://www.dabeaz.com/generators-uk/). I re-read it on a regular
basis and always pick up something new...

-regards
Simeon Franklin


More information about the Baypiggies mailing list