[SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Oct 28 13:37:08 EDT 2016


On Fri, Oct 28, 2016 at 1:21 PM, Nicolas Chopin <nicolas.chopin at ensae.fr>
wrote:

> If I time just the rvs call then I get essentially the same time as with
> > x = scipy.stats.norm.rvs(size=1000)
>
> so yes, it's the initialisation of the frozen distribution that costs so
> much. And, in my case, it seems it adds up to quite a lot.
>
> So what you're saying is that indeed there was recent change that makes
> frozen dist creation more expensive? so that's "a feature not a bug"? In
> that case, I will create my own classes. A pity, but well...
>

Creating a new instance is a feature. It's still possible that there is
some speedup possible in the implementation but AFAIR I didn't see anything
that would have been obvious (a few mu-s up or down?)

However, given your description that you pass the frozen instances around,
you shouldn't be so much instance creation, otherwise you could also use
the unfrozen global instance of the distributions.

In general, I avoid scipy.stats.distributions in loops for restricted cases
when I don't need the flexibility and input checking, but I don't think
it's worth the effort when we would have to replicate most of what's
already there.

Josef



>
> Thanks a lot for your prompt answer
> Nicolas
>
> On Fri, 28 Oct 2016 at 19:12 <josef.pktd at gmail.com> wrote:
>
>> On Fri, Oct 28, 2016 at 12:53 PM, Nicolas Chopin <nicolas.chopin at ensae.fr
>> > wrote:
>>
>>  Hi list,
>> I'm working on a package that does some complicate Monte Carlo
>> experiments. The package passes around frozen distributions quite a lot.
>> Trying to understand why certain parts were so slow, I did a bit of
>> profiling, and stumbled upon this:
>>
>>  > %timeit x = scipy.stats.norm.rvs(size=1000)
>> > 10000 loops, best of 3: 49.3 µs per loop
>>
>> > %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
>> > 1000 loops, best of 3: 512 µs per loop
>>
>>
>> Can you time here just the rvs call and not the instantiation of the
>> frozen distribution.
>>
>> Frozen distributions have now more overhead in the construction because a
>> new instance of the distribution is created instead of reusing the global
>> instance as in older scipy versions.That might still have an effect in the
>> µs range.
>> (The reason was to avoid the possibility of spillover of attributes
>> across instances.)
>>
>>
>>
>>
>> So a x10 penalty when using a frozen dist, even if the size of the
>> simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I
>> cannot replicate this problem on another machine with scipy 0.13.3 and
>> Ubuntu 14.04 (there is a penalty, but it's much smaller).
>>
>> In the profiler, I can see that a lot of time is spent doing string
>> operations (such as expand_tabs) in order to generate the doc. In the
>> source, I see that this may depend on a certain -00 flag???
>>
>> I do realise that instantiating a frozen distribution requires some
>> argument checking and what not, but here it looks too expensive. For my
>> package, this amounts to hours spent on ... tab extensions?
>>
>> Anyway, I'd like to ask
>> (a) is this a known problem? I could not find anything on-line about
>> this.
>> (b) Is this going to be fixed in some future version of scipy?
>> (c) is there a way to fix this with *this* version of scipy using this
>> flag mentioned in the source, and then how?
>> (c) or should I instead re-define manually my own distributions objects?
>> (it's really convenient for what I'm trying to do to define distributions
>> as objects with methods rvs, logpdf, and so on).
>>
>>
>> I think we never had any discussion on timing details. Overall, the
>> overhead of scipy.stats.distributions is not relatively small when the
>> underlying calculation is fast, e.g. using numpy.random directly for rvs is
>> quite a bit faster, when the function is available in numpy.
>>
>> Josef
>>
>>
>>
>> Many thanks for reading this! :-)
>> All the best
>>
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> https://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> https://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20161028/b0b687bf/attachment.html>


More information about the SciPy-User mailing list