[Baypiggies] reading very large files

Tue May 17 19:39:26 CEST 2011

[pardon; I accidentally replied only to Vikram at first]

On Tue, May 17, 2011 at 10:17 AM, Vikram K <kpguy1975 at gmail.com> wrote:

> I wish to read a large data file (file size is around 1.8 MB) and
> manipulate the data in this file. Just reading and writing the first 500
> lines of this file is causing a problem. I wrote:
>
[snip]

> Traceback (most recent call last):
>   File
> "H:\genome_4_omics_study\GS000003696-DID\GS00471-DNA_B01_1101_37-ASM\GS00471-DNA_B01\ASM\gene-GS00471-DNA_B01_1101_37-ASM.tsv\test.py",
> line 3, in <module>
>     for i in fin.readlines():
> MemoryError
>
> -------
> is there a way to stop python from slurping all the  file contents at once
> ?
>

Just don't use readlines. Instead, if you're reading line-by-line, use the
file object as iterable:

for line in fin:
  ...

You can dress this up a little with something like

import itertools

with open('gene-GS0001') as fin:
   for line in itertools.islice(fin, 500):
     print line

Note that this assumes the datafile is a line-bound file. Otherwise, you
should look at, e.g., using the read() method directly:

chunk = fin.read(512)
...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110517/a1f40cea/attachment.html>