[SciPy-user] Very slow loadmat in scipy 0.7 (regression)
Matthieu Brucher
matthieu.brucher at gmail.com
Sun Feb 22 06:58:43 EST 2009
Hi,
This issue popped up in the scipy-dev ML and will be fixed in the future.
Matthieu
2009/2/22 Antonino Ingargiola <tritemio at gmail.com>:
> Hi to the list,
>
> I'm loading matlab file of a few tents of MB in python with
> scipy.io.loadmat. With scipy 0.6 (the stock ubuntu 8.10 version) the
> load takes a few seconds (2-5 sec). Now with scipy 0.7 it takes much
> longer, around 80 secs.
>
> I did a profile and found that the all the time is spent in
> GzipInputStream.__zfill method. I blindly tried to change the
> GzipInputStream.blocksize attribute from 16K to 256K and 1M and found
> that the performances become exponentially better. Here there are the
> profile resuts loading a 33M matlab file:
>
> *Scipy 0.7 default, BUFFER 16K*
>
> 12984 function calls (12981 primitive calls) in 140.456 CPU seconds
>
> Ordered by: internal time
> List reduced from 40 to 3 due to restriction <3>
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 27 139.250 5.157 140.304 5.196 gzipstreams.py:80(__fill)
> 2119 0.950 0.000 0.950 0.000 {built-in method decompress}
> 9 0.123 0.014 0.123 0.014 {method 'copy' of
> 'numpy.ndarray' objects}
>
>
> *BUFFER 256K*
>
> 1080 function calls (1077 primitive calls) in 9.988 CPU seconds
>
> Ordered by: internal time
> List reduced from 40 to 3 due to restriction <3>
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 27 8.870 0.329 9.833 0.364 gzipstreams.py:80(__fill)
> 135 0.925 0.007 0.925 0.007 {built-in method decompress}
> 9 0.124 0.014 0.124 0.014 {method 'copy' of
> 'numpy.ndarray' objects}
>
>
> *BUFFER 1M*
>
> 480 function calls (477 primitive calls) in 3.509 CPU seconds
>
> Ordered by: internal time
> List reduced from 40 to 3 due to restriction <3>
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 27 2.329 0.086 3.302 0.122 gzipstreams.py:80(__fill)
> 35 0.925 0.026 0.925 0.026 {built-in method decompress}
> 9 0.124 0.014 0.124 0.014 {method 'copy' of
> 'numpy.ndarray' objects}
>
>
>
> As you can see there is a dramatic improvement as the time passes from
> 140 to around 3 seconds.
>
> I think that the default value should be raised a bit (at least 256K),
> but as the performance hit can be so big is definitely better to have
> this as keyword argument directly in io.loadmat.
>
> Any comment is appreciated.
>
> - Antonio
>
> PS: the test file used for the profiling is attached.
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
>
--
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
More information about the SciPy-User
mailing list