Out of memory while reading excel file

Mahmood Naderan nt_mahmood at yahoo.com
Wed May 10 13:11:54 EDT 2017


Well actually cells are treated as strings and not integer or float numbers.

One way to overcome is to get the number of rows and then split it to 4 or 5 arrays and then process them. However, i was looking for a better solution. 

I read in pages that large excels are in the order of milion rows. Mine is about 100k. Currently, the task manager shows about 4GB of ram usage while working with numpy.

Regards,
Mahmood

--------------------------------------------
On Wed, 5/10/17, Peter Otten <__peter__ at web.de> wrote:

 Subject: Re: Out of memory while reading excel file
 To: python-list at python.org
 Date: Wednesday, May 10, 2017, 3:48 PM
 
 Mahmood Naderan via Python-list wrote:
 
 > Thanks for your reply. The
 openpyxl part (reading the workbook) works
 > fine. I printed some debug
 information and found that when it reaches the
 > np.array, after some 10 seconds,
 the memory usage goes high.
 > 
 > 
 > So, I think numpy is unable to
 manage the memory.
 
 Hm, I think numpy is designed to manage
 huge arrays if you have enough RAM.
 
 Anyway: are all values of the same
 type? Then the numpy array may be kept 
 much smaller than in the general case
 (I think). You can also avoid the 
 intermediate list of lists:
 
 wb =
 load_workbook(filename='beta.xlsx', read_only=True)
 ws = wb['alpha']
 
 a = numpy.zeros((ws.max_row,
 ws.max_column), dtype=float)
 for y, row in enumerate(ws.rows):
     a[y] = [cell.value for
 cell in row]
 
 
 -- 
 https://mail.python.org/mailman/listinfo/python-list
 



More information about the Python-list mailing list