[SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0

Fri Oct 28 13:12:41 EDT 2016

On Fri, Oct 28, 2016 at 12:53 PM, Nicolas Chopin <nicolas.chopin at ensae.fr>
wrote:

>  Hi list,
> I'm working on a package that does some complicate Monte Carlo
> experiments. The package passes around frozen distributions quite a lot.
> Trying to understand why certain parts were so slow, I did a bit of
> profiling, and stumbled upon this:
>
>  > %timeit x = scipy.stats.norm.rvs(size=1000)
> > 10000 loops, best of 3: 49.3 µs per loop
>
> > %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
> > 1000 loops, best of 3: 512 µs per loop
>

Can you time here just the rvs call and not the instantiation of the frozen
distribution.

Frozen distributions have now more overhead in the construction because a
new instance of the distribution is created instead of reusing the global
instance as in older scipy versions.That might still have an effect in the
µs range.
(The reason was to avoid the possibility of spillover of attributes across
instances.)

>
> So a x10 penalty when using a frozen dist, even if the size of the
> simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I
> cannot replicate this problem on another machine with scipy 0.13.3 and
> Ubuntu 14.04 (there is a penalty, but it's much smaller).
>
> In the profiler, I can see that a lot of time is spent doing string
> operations (such as expand_tabs) in order to generate the doc. In the
> source, I see that this may depend on a certain -00 flag???
>
> I do realise that instantiating a frozen distribution requires some
> argument checking and what not, but here it looks too expensive. For my
> package, this amounts to hours spent on ... tab extensions?
>
> Anyway, I'd like to ask
> (a) is this a known problem? I could not find anything on-line about this.
> (b) Is this going to be fixed in some future version of scipy?
> (c) is there a way to fix this with *this* version of scipy using this
> flag mentioned in the source, and then how?
> (c) or should I instead re-define manually my own distributions objects?
> (it's really convenient for what I'm trying to do to define distributions
> as objects with methods rvs, logpdf, and so on).
>

I think we never had any discussion on timing details. Overall, the
overhead of scipy.stats.distributions is not relatively small when the
underlying calculation is fast, e.g. using numpy.random directly for rvs is
quite a bit faster, when the function is available in numpy.

Josef

>
> Many thanks for reading this! :-)
> All the best
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20161028/65fb79f4/attachment.html>