[SciPy-Dev] creation / pickling of stats distributions

Tue Jul 14 19:03:01 EDT 2020

I have some code that uses multiprocessing.Pool for parallelisation. This
requires that an object is pickled. This object has an `rv_frozen`
distribution as an attribute. It turns out that a performance is much
improved if the `rv_frozen` distribution is not present --> pickling of
`rv_frozen` objects is expensive. Creation of `rv_frozen` objects is also
expensive.

```
>>> import scipy.stats as stats
>>> import pickle
>>> %timeit stats.norm(scale=1, loc=1)
694 µs ± 123 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> rv = stats.norm(scale=1, loc=1)
>>> %timeit s = pickle.dumps(rv); pickle.loads(s)
1.02 ms ± 24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

I'd be hoping for an order of magnitude less in time for either of those.
Using line profiling two of the big culprits for slowness during object
creation are `rv_continuous._construct_doc` (50% of the total time, with a
large part spent in `_lib.doccer.docformat`!!) and
`rv_continuous._construct_argparser`

My questions are:

1) Is it possible to speed up pickling/unpickling of these objects? (e.g.
__setstate__/__getstate__, custom reduction, copyreg magic, ...)
2) Is there any way to turn off docstring creation (or speeding it up),
besides starting the interpreter with -OO?

_____________________________________
Dr. Andrew Nelson

_____________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20200715/60432bec/attachment.html>