[SciPy-Dev] creation / pickling of stats distributions

Andrew Nelson andyfaff at gmail.com
Tue Jul 14 19:03:01 EDT 2020

I have some code that uses multiprocessing.Pool for parallelisation. This
requires that an object is pickled. This object has an `rv_frozen`
distribution as an attribute. It turns out that a performance is much
improved if the `rv_frozen` distribution is not present --> pickling of
`rv_frozen` objects is expensive. Creation of `rv_frozen` objects is also

>>> import scipy.stats as stats
>>> import pickle
>>> %timeit stats.norm(scale=1, loc=1)
694 µs ± 123 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> rv = stats.norm(scale=1, loc=1)
>>> %timeit s = pickle.dumps(rv); pickle.loads(s)
1.02 ms ± 24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I'd be hoping for an order of magnitude less in time for either of those.
Using line profiling two of the big culprits for slowness during object
creation are `rv_continuous._construct_doc` (50% of the total time, with a
large part spent in `_lib.doccer.docformat`!!) and

My questions are:

1) Is it possible to speed up pickling/unpickling of these objects? (e.g.
__setstate__/__getstate__, custom reduction, copyreg magic, ...)
2) Is there any way to turn off docstring creation (or speeding it up),
besides starting the interpreter with -OO?

Dr. Andrew Nelson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20200715/60432bec/attachment.html>

More information about the SciPy-Dev mailing list