[SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0

Nicolas Chopin nicolas.chopin at ensae.fr
Fri Oct 28 13:21:45 EDT 2016


If I time just the rvs call then I get essentially the same time as with
> x = scipy.stats.norm.rvs(size=1000)

so yes, it's the initialisation of the frozen distribution that costs so
much. And, in my case, it seems it adds up to quite a lot.

So what you're saying is that indeed there was recent change that makes
frozen dist creation more expensive? so that's "a feature not a bug"? In
that case, I will create my own classes. A pity, but well...

Thanks a lot for your prompt answer
Nicolas

On Fri, 28 Oct 2016 at 19:12 <josef.pktd at gmail.com> wrote:

> On Fri, Oct 28, 2016 at 12:53 PM, Nicolas Chopin <nicolas.chopin at ensae.fr>
> wrote:
>
>  Hi list,
> I'm working on a package that does some complicate Monte Carlo
> experiments. The package passes around frozen distributions quite a lot.
> Trying to understand why certain parts were so slow, I did a bit of
> profiling, and stumbled upon this:
>
>  > %timeit x = scipy.stats.norm.rvs(size=1000)
> > 10000 loops, best of 3: 49.3 µs per loop
>
> > %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
> > 1000 loops, best of 3: 512 µs per loop
>
>
> Can you time here just the rvs call and not the instantiation of the
> frozen distribution.
>
> Frozen distributions have now more overhead in the construction because a
> new instance of the distribution is created instead of reusing the global
> instance as in older scipy versions.That might still have an effect in the
> µs range.
> (The reason was to avoid the possibility of spillover of attributes across
> instances.)
>
>
>
>
> So a x10 penalty when using a frozen dist, even if the size of the
> simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I
> cannot replicate this problem on another machine with scipy 0.13.3 and
> Ubuntu 14.04 (there is a penalty, but it's much smaller).
>
> In the profiler, I can see that a lot of time is spent doing string
> operations (such as expand_tabs) in order to generate the doc. In the
> source, I see that this may depend on a certain -00 flag???
>
> I do realise that instantiating a frozen distribution requires some
> argument checking and what not, but here it looks too expensive. For my
> package, this amounts to hours spent on ... tab extensions?
>
> Anyway, I'd like to ask
> (a) is this a known problem? I could not find anything on-line about this.
> (b) Is this going to be fixed in some future version of scipy?
> (c) is there a way to fix this with *this* version of scipy using this
> flag mentioned in the source, and then how?
> (c) or should I instead re-define manually my own distributions objects?
> (it's really convenient for what I'm trying to do to define distributions
> as objects with methods rvs, logpdf, and so on).
>
>
> I think we never had any discussion on timing details. Overall, the
> overhead of scipy.stats.distributions is not relatively small when the
> underlying calculation is fast, e.g. using numpy.random directly for rvs is
> quite a bit faster, when the function is available in numpy.
>
> Josef
>
>
>
> Many thanks for reading this! :-)
> All the best
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20161028/1b5c0c11/attachment.html>


More information about the SciPy-User mailing list