[Numpy-discussion] "import numpy" performance

Nathaniel Smith njs at pobox.com
Mon Jul 2 18:21:08 EDT 2012


On Mon, Jul 2, 2012 at 10:44 PM, Andrew Dalke <dalke at dalkescientific.com> wrote:
> On Jul 2, 2012, at 10:34 PM, Nathaniel Smith wrote:
>> I don't have any opinion on how acceptable this would be, but I also
>> don't see a benchmark showing how much this would help?
>
> The profile output was lower in that email. The relevant line is
>
> 0.038 add_newdocs (numpy.core.multiarray)

Yes, but for a proper benchmark we need to compare this to the number
that we would get with some other implementation... I'm assuming you
aren't proposing we just delete the docstrings :-).

> This says that 'add_newdocs', which is imported from
> numpy.core.multiarray (though there may be other importers)
> takes 0.038 seconds to go through __import__, including
> all of its children module imports.

There are no "children modules", all these modules refer to each
other, and you're assuming that whichever module you happen to load
first is responsible for all the other modules it happens to
reference.

>     add_newdocs: 0.067 (numpy.core.multiarray)
>      numpy.lib: 0.061 (add_newdocs)

I'm pretty sure that what these two lines say is that the actual
add_newdocs code only takes 0.006 seconds?

>         numpy.testing: 0.041 (numpy.core.numeric)

However, it does look like numpy.testing is responsible for something
like 35% of our startup overhead and for pulling in a ton of extra
modules (with associated disk seeks), which is pretty dumb.

>>> With instrumentation I found that 0.083s of the 0.119s
>>> is spent loading numpy.core.multiarray.

The number 0.083 doesn't appear anywhere in that profile you pasted,
so I don't know where this comes from...

Anyway, it sounds like the answer is that importing
numpy.core.multiarray doesn't take that long; you're measuring the
total time to do 'import numpy', and it just happens that
numpy.core.multiarray is the first module you load. (BTW, you probably
shouldn't be importing numpy.core.multiarray directly at all, just do
'import numpy'.)

-N



More information about the NumPy-Discussion mailing list