[SciPy-Dev] distributions.py
Jake Vanderplas
vanderplas at astro.washington.edu
Fri Sep 14 16:56:50 EDT 2012
On 09/14/2012 01:49 PM, Ralf Gommers wrote:
>
>
> On Fri, Sep 14, 2012 at 12:48 AM, <josef.pktd at gmail.com
> <mailto:josef.pktd at gmail.com>> wrote:
>
> On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest
> <vanforeest at gmail.com <mailto:vanforeest at gmail.com>> wrote:
> > Hi,
> >
> > Now that I understand github (Thanks to Ralf for his explanations in
> > Dutch) and got some simple stuff out of the way in
> distributions.py I
> > would like to tackle a somewhat harder issue. The function
> argsreduce
> > is, as far as I can see, too generic. I did some tests to see
> whether
> > its most generic output, as described by its docstring, is actually
> > swallowed by the callers of argsreduce, but this appears not to
> be the
> > case.
>
> being generic is not a disadvantage (per se) if it's fast
> https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665
> (and a being a one liner is not a disadvantage either)
>
> Josef
>
> >
> > My motivation to simplify the code in distributions.py (and clean it
> > up) is partly based on making it simpler to understand for
> myself, but
> > also to others. The fact that github makes code browsing a much
> nicer
> > experience, perhaps more people will take a look at what's under the
> > hood. But then the code should also be accessible and clean. Are
> there
> > any reasons not to pursue this path, and focus on more important
> > problems of the stats library?
>
>
> Not sure that argsreduce is the best place to start (see Josef's
> reply), but there should be things that can be done to make the code
> easier to read. For example, this code is used in ~10 methods of
> rv_continuous:
>
> loc,scale=map(kwds.get,['loc','scale'])
> args, loc, scale = self._fix_loc_scale(args, loc, scale)
> x,loc,scale = map(asarray,(x,loc,scale))
> args = tuple(map(asarray,args))
>
> Some refactoring may be in order. The same is true of the rest of the
> implementation of many of those methods. Some are exactly the same
> except for calls to the corresponding underscored method (example:
> logsf() and logcdf() are identical except for calls to _logsf() and
> _logcdf(), and one nonsensical multiplication).
>
> Ralf
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
I would say that the most important improvement needed in distributions
is in the documentation. A new user would look at the doc string of,
say, scipy.stats.norm, and have no idea how to proceed. Here's the
current example from the docstring of scipy.stats.norm:
Examples
--------
>>> from scipy.stats import norm
>>> numargs = norm.numargs
>>> [ ] = [0.9,] * numargs
>>> rv = norm()
>>> x = np.linspace(0, np.minimum(rv.dist.b, 3))
>>> h = plt.plot(x, rv.pdf(x))
I don't even know what that means... and it doesn't compile. Also, what
is b? how would I enter mu and sigma to make a normal distribution?
It's all pretty opaque.
Jake
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20120914/016ad33f/attachment.html>
More information about the SciPy-Dev
mailing list