[SciPy-Dev] distributions.py

Jake Vanderplas vanderplas at astro.washington.edu
Fri Sep 14 16:56:50 EDT 2012


On 09/14/2012 01:49 PM, Ralf Gommers wrote:
>
>
> On Fri, Sep 14, 2012 at 12:48 AM, <josef.pktd at gmail.com 
> <mailto:josef.pktd at gmail.com>> wrote:
>
>     On Thu, Sep 13, 2012 at 5:21 PM, nicky van foreest
>     <vanforeest at gmail.com <mailto:vanforeest at gmail.com>> wrote:
>     > Hi,
>     >
>     > Now that I understand github (Thanks to Ralf for his explanations in
>     > Dutch) and got some simple stuff out of the way in
>     distributions.py I
>     > would like to tackle a somewhat harder issue. The function
>     argsreduce
>     > is, as far as I can see, too generic. I did some tests to see
>     whether
>     > its most generic output, as described by its docstring, is actually
>     > swallowed by the callers of argsreduce, but this appears not to
>     be the
>     > case.
>
>     being generic is not a disadvantage (per se) if it's fast
>     https://github.com/scipy/scipy/commit/4abdc10487d453b56f761598e8e013816b01a665
>     (and a being a one liner is not a disadvantage either)
>
>     Josef
>
>     >
>     > My motivation to simplify the code in distributions.py (and clean it
>     > up) is partly based on making it simpler to understand for
>     myself, but
>     > also to  others. The fact that github makes code browsing a much
>     nicer
>     > experience, perhaps more people will take a look at what's under the
>     > hood. But then the code should also be accessible and clean. Are
>     there
>     > any reasons not to pursue this path, and focus on more important
>     > problems of the stats library?
>
>
> Not sure that argsreduce is the best place to start (see Josef's 
> reply), but there should be things that can be done to make the code 
> easier to read. For example, this code is used in ~10 methods of 
> rv_continuous:
>
>         loc,scale=map(kwds.get,['loc','scale'])
>         args, loc, scale = self._fix_loc_scale(args, loc, scale)
>         x,loc,scale = map(asarray,(x,loc,scale))
>         args = tuple(map(asarray,args))
>
> Some refactoring may be in order. The same is true of the rest of the 
> implementation of many of those methods. Some are exactly the same 
> except for calls to the corresponding underscored method (example: 
> logsf() and logcdf() are identical except for calls to _logsf() and 
> _logcdf(), and one nonsensical multiplication).
>
> Ralf
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
I would say that the most important improvement needed in distributions 
is in the documentation.  A new user would look at the doc string of, 
say, scipy.stats.norm, and have no idea how to proceed.  Here's the 
current example from the docstring of scipy.stats.norm:

Examples
--------
 >>> from scipy.stats import norm
 >>> numargs = norm.numargs
 >>> [  ] = [0.9,] * numargs
 >>> rv = norm()

 >>> x = np.linspace(0, np.minimum(rv.dist.b, 3))
 >>> h = plt.plot(x, rv.pdf(x))

I don't even know what that means... and it doesn't compile.  Also, what 
is b?  how would I enter mu and sigma to make a normal distribution?  
It's all pretty opaque.
     Jake

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20120914/016ad33f/attachment.html>


More information about the SciPy-Dev mailing list