[Numpy-discussion] "import numpy" is slow

Andrew Dalke dalke at dalkescientific.com
Thu Jul 31 06:36:46 EDT 2008


On Jul 31, 2008, at 11:42 AM, Stéfan van der Walt wrote:
> Maybe when we're convinced that there is a lot to be gained from
> making such a change.  From my perspective, it doesn't look good:
>
> I) Major code breakage
> II) Confused users
> III) More difficult function discovery for beginners

I'm not asking for a change.  I fully realize this.  I happen
to think it's a mistake and there are other ways to have addressed
the underlying requirement, but I know that's not going to change.
(For example, follow matplotlib approach where there's a special
library designed to be imported in interactive use.  But I am *not*
proposing this change.)

I point out that this make numpy different than most other
Python packages.  Had this not been done then
   I) would not be a problem,
   II) is I think a wash, because people starting with numpy will still
wonder why

 >>> import PIL
 >>> PIL.Image
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'Image'
 >>> import PIL.Image
 >>> PIL.Image
<module 'PIL.Image' from '/Library/Frameworks/Python.framework/ 
Versions/2.5/lib/python2.5/site-packages/PIL/Image.pyc'>
 >>>

and

 >>> import xml
 >>> xml.etree
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'etree'
 >>> from xml import etree
 >>> xml.etree
<module 'xml.etree' from '/Library/Frameworks/Python.framework/ 
Versions/2.5/lib/python2.5/xml/etree/__init__.pyc'>
 >>>


occur.

   III) assumes there couldn't have been other solutions.  And it  
assumes that the difficulties are large, which I haven't seen in my  
experience.

> I) Slight improvement in startup speed.

The user base for numpy might be .. 10,000 people?  100,000 people?   
Let's go with the latter, and assume that with command-line scripts,  
CGI scripts, and the other programs that people write in order to  
help do research means that numpy is started on average 10 times a day.

100,000 people * 10 times / day * 0.1 seconds per startup
    = almost 28 people-hours spent each day waiting for numpy to start.

I'm willing to spend a few days to achieve that.

Perhaps there's fewer people than I'm estimating.  OTOH, perhaps  
there are more imports of numpy per day.  An order of magnitude less  
time is still a couple of hours each day as the world waits to import  
all of the numpy libraries.

If on average people import numpy 10 times a day and it could be made  
0.1 seconds faster then that's 1 second per person per day.  If it  
takes on average 5 minutes to learn to import the module directly and  
the onus is all on numpy, then after 1 year of use the efficiency has  
made up for it, and the benefits continue to grow.

Slight improvements add up when multiplied by everyone.  The goals of  
numpy when it started aren't going to be the same as when it's a  
mature, widely used and deployed package.
>

				Andrew
				dalke at dalkescientific.com





More information about the NumPy-Discussion mailing list