[Baypiggies] reading very large files

Lucas Wiman lucas.wiman at gmail.com
Tue May 17 21:34:47 CEST 2011


It's also extremely surprising to me that reading a 1.8MB file is causing a
memory error.  That's actually not a particularly large file, and if it is
causing a memory error, there must be something wrong with the your Python
configuration or build.

Best,
Lucas

On Tue, May 17, 2011 at 12:26 PM, Lucas Wiman <lucas.wiman at gmail.com> wrote:

>
>
> On Tue, May 17, 2011 at 10:56 AM, <baypiggies-request at python.org> wrote:
>
>>
>> I wish to read a large data file (file size is around 1.8 MB) and
>> manipulate
>> the data in this file. Just reading and writing the first 500 lines of
>> this
>> file is causing a problem. I wrote:
>>
>> fin = open('gene-GS00471-DNA_B01_1101_37-ASM.tsv')
>> count = 0
>> for i in fin.readlines():
>>    print i
>>    count += 1
>>    if count >= 500:
>>        break
>>
>> and got this error msg:
>>
>> Traceback (most recent call last):
>>  File
>>
>> "H:\genome_4_omics_study\GS000003696-DID\GS00471-DNA_B01_1101_37-ASM\GS00471-DNA_B01\ASM\gene-GS00471-DNA_B01_1101_37-ASM.tsv\test.py",
>> line 3, in <module>
>>    for i in fin.readlines():
>> MemoryError
>>
>
> If your data is actually a tsv (tab-separated value format), you should be
> using the csv module for actually iterating over lines in it.  Just set the
> delimiter to '\t' and look at the docs at
> http://docs.python.org/library/csv.html
>
> You should also generally use the "with" syntax when dealing with files
> since it handles closing the file object for you (probably not an issue when
> you're just reading from a single file, but best practices nonetheless).
>  Here's how I would deal with your situation:
>
> import csv
>
> with open('gene-GS00471-DNA_B01_1101_37-ASM.tsv', 'r') as f:
>     r = csv.reader(f, delimiter='\t')
>     for row in r:
>         # row is a list of strings that correspond to the columns in your
> file
>         do_stuff_with_the_row(row)
> # your file object f is now closed
>
> Best wishes,
> Lucas Wiman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110517/08dd41aa/attachment.html>


More information about the Baypiggies mailing list