Out of memory while reading excel file

Peter Otten __peter__ at web.de
Wed May 10 11:48:24 EDT 2017


Mahmood Naderan via Python-list wrote:

> Thanks for your reply. The openpyxl part (reading the workbook) works
> fine. I printed some debug information and found that when it reaches the
> np.array, after some 10 seconds, the memory usage goes high.
> 
> 
> So, I think numpy is unable to manage the memory.

Hm, I think numpy is designed to manage huge arrays if you have enough RAM.

Anyway: are all values of the same type? Then the numpy array may be kept 
much smaller than in the general case (I think). You can also avoid the 
intermediate list of lists:

wb = load_workbook(filename='beta.xlsx', read_only=True)
ws = wb['alpha']

a = numpy.zeros((ws.max_row, ws.max_column), dtype=float)
for y, row in enumerate(ws.rows):
    a[y] = [cell.value for cell in row]





More information about the Python-list mailing list