[SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0

Nicolas Chopin nicolas.chopin at ensae.fr
Sat Oct 29 10:51:42 EDT 2016


hi,
Charles: no, I didn't, I'm not clear how to use this flag?

Ralf: since you're asking, I may as well give you more details about my
stuff. Basically, I'd like to do some basic probabilistic programming:
i.e.to give the user the ability to define stochastic models as Python
objects; e.g.

class MarkovChain(object):
   " abstract class "
    def simulate(T):
        path = []
         for t in range(T):
            path.extend(self.M(path[t-1]))

class RandomWalk(MarkovChain):
    def __init__(self,sigma=1.):
        self.sigma = sigma
    def M(self,t,xp):
        return stats.norm(loc=xp,scale=self.sigma)

Here, I define a base class for Markov chains, with method simulate that
can simulate a trajectory. Then I define a particular (parametric)
sub-class, that of Gaussian random walks.

One part of my package defines an algorithm that takes as an argument such
a *class*, generate many possible parameters (above, sigma), and for each
parameter, generate trajectories; sometimes the logpdf or the ppf functions
must be computed as well. Of course, I could ask the user to provide as an
input a function for generating rvs, but then I would need to ask also a
function for computing the log-pdf, and so on.

In fact, I have a few ideas (and prototype code) on how to extend frozen
distributions so as to do more advanced probabilistic programming, such as:
* product distributions:
prod_dist(stats.beta(3,2), norm(loc=3) )
returns an object that corresponds to the distribution of (X,Y), where
X~Beta(3,2), Y~N(3,1);
for instance if you apply method rvs, you obtain a [N,2] numpy array
* dict distribution:
same idea, but returns a record array, (or takes a record array for logpdf,
etc)

But I'm not sure there's much interest in extending scipy distributions in
this way?
Best

On Sat, 29 Oct 2016 at 15:06 Charles R Harris <charlesr.harris at gmail.com>
wrote:

> On Fri, Oct 28, 2016 at 10:53 AM, Nicolas Chopin <nicolas.chopin at ensae.fr>
> wrote:
>
>  Hi list,
> I'm working on a package that does some complicate Monte Carlo
> experiments. The package passes around frozen distributions quite a lot.
> Trying to understand why certain parts were so slow, I did a bit of
> profiling, and stumbled upon this:
>
>  > %timeit x = scipy.stats.norm.rvs(size=1000)
> > 10000 loops, best of 3: 49.3 µs per loop
>
> > %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
> > 1000 loops, best of 3: 512 µs per loop
>
> So a x10 penalty when using a frozen dist, even if the size of the
> simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I
> cannot replicate this problem on another machine with scipy 0.13.3 and
> Ubuntu 14.04 (there is a penalty, but it's much smaller).
>
> In the profiler, I can see that a lot of time is spent doing string
> operations (such as expand_tabs) in order to generate the doc. In the
> source, I see that this may depend on a certain -00 flag???
>
>
> Did you try running with the -OO flag? Anyone know how well that works?
>
> Chuck
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20161029/1c79c8c6/attachment.html>


More information about the SciPy-User mailing list