Out of memory while reading excel file

Peter Otten __peter__ at web.de
Wed May 10 10:51:41 EDT 2017


Mahmood Naderan via Python-list wrote:

> Hello,
> 
> The following code which uses openpyxl and numpy, fails to read large
> Excel (xlsx) files. The file si 20Mb which contains 100K rows and 50
> columns.
> 
> 
> 
> W = load_workbook(fname, read_only = True)
> 
> p = W.worksheets[0]
> 
> a=[]
> 
> m = p.max_row
> 
> n = p.max_column
> 
> 
> np.array([[i.value for i in j] for j in p.rows])
> 
> 
> 
> How can I fix that? I have stuck at this problem. For medium sized files
> (16K rows and 50 columns) it is fine.

The docs at

https://openpyxl.readthedocs.io/en/default/optimized.html#read-only-mode

promise "(near) constant memory consumption" for the sample script below:

from openpyxl import load_workbook
wb = load_workbook(filename='large_file.xlsx', read_only=True)
ws = wb['big_data']

for row in ws.rows:
    for cell in row:
        print(cell.value)

If you change only the file and worksheet name to your needs -- does the 
script run to completion in reasonable time (redirect stdout to /dev/null) 
and with reasonable memory usage?

If it does you may be wasting memory elsewhere; otherwise you might need to 
convert the xlsx file to csv using your spreadsheet application before 
processing the data in Python.




More information about the Python-list mailing list