Memory error due to the huge/huge input file size

tejsupra at gmail.com tejsupra at gmail.com
Mon Nov 10 16:47:49 EST 2008


Hello Everyone,

I need to read a .csv file which has a size of 2.26 GB . And I wrote a
Python script , where I need to read this file. And my Computer has 2
GB RAM Please see the code as follows:

"""
This program has been developed to retrieve all the promoter sequences
for the specified
list of genes in the given cluster

So, this program will act as a substitute to the whole EZRetrieve
system

Input arguments:

1) Cluster.txt or DowRatClust161718bwithDummy.txt
2) TransProCrossReferenceAndSequences.csv -> This is the file that has
all the promoter sequences
3) -2000
4) 500
"""

import time
import csv
import sys
import linecache
import re
from sets import Set
import gc

print time.localtime()

fileInputHandler = open(sys.argv[1],"r")
line = fileInputHandler.readline()

refSeqIDsinTransPro = []
promoterSequencesinTransPro = []
reader2 = csv.reader(open(sys.argv[2],"rb"))
reader2_list = []
reader2_list.extend(reader2)

for data2 in reader2_list:
   refSeqIDsinTransPro.append(data2[3])
for data2 in reader2_list:
   promoterSequencesinTransPro.append(data2[4])

while line:
   l = line.rstrip('\n')
   for j in range(1,len(refSeqIDsinTransPro)):
      found = re.search(l,refSeqIDsinTransPro[j])
      if found:
         """promoterSequencesinTransPro[j]  """
         print l

   line = fileInputHandler.readline()


fileInputHandler.close()


The error that I got is given as follows:
Traceback (most recent call last):
  File "RefSeqsToPromoterSequences.py", line 31, in <module>
    reader2_list.extend(reader2)
MemoryError

I understand that the issue is Memory error and it is caused because
of the  line reader2_list.extend(reader2). Is there any other
alternative method in reading the .csv file  line by line?

sincerely,
Suprabhath



More information about the Python-list mailing list