[Baypiggies] reading very large files

Tue May 17 21:26:16 CEST 2011

On Tue, May 17, 2011 at 10:56 AM, <baypiggies-request at python.org> wrote:

>
> I wish to read a large data file (file size is around 1.8 MB) and
> manipulate
> the data in this file. Just reading and writing the first 500 lines of this
> file is causing a problem. I wrote:
>
> fin = open('gene-GS00471-DNA_B01_1101_37-ASM.tsv')
> count = 0
> for i in fin.readlines():
>    print i
>    count += 1
>    if count >= 500:
>        break
>
> and got this error msg:
>
> Traceback (most recent call last):
>  File
>
> "H:\genome_4_omics_study\GS000003696-DID\GS00471-DNA_B01_1101_37-ASM\GS00471-DNA_B01\ASM\gene-GS00471-DNA_B01_1101_37-ASM.tsv\test.py",
> line 3, in <module>
>    for i in fin.readlines():
> MemoryError
>

If your data is actually a tsv (tab-separated value format), you should be
using the csv module for actually iterating over lines in it.  Just set the
delimiter to '\t' and look at the docs at
http://docs.python.org/library/csv.html

You should also generally use the "with" syntax when dealing with files
since it handles closing the file object for you (probably not an issue when
you're just reading from a single file, but best practices nonetheless).
 Here's how I would deal with your situation:

import csv

with open('gene-GS00471-DNA_B01_1101_37-ASM.tsv', 'r') as f:
    r = csv.reader(f, delimiter='\t')
    for row in r:
        # row is a list of strings that correspond to the columns in your
file
        do_stuff_with_the_row(row)
# your file object f is now closed

Best wishes,
Lucas Wiman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110517/d56bc01c/attachment.html>