PEP 450 Adding a statistics module to Python

Roy Smith roy at panix.com
Sat Aug 10 07:50:23 EDT 2013


In article <mailman.417.1376104455.1251.python-list at python.org>,
 Skip Montanaro <skip at pobox.com> wrote:

> Given that installing numpy or scipy is generally no more difficult
> that executing "pip install (scipy|numpy)" I'm not really feeling the
> need for a battery here...

I just tried installing numpy in a fresh virtualenv on an Ubuntu Precise 
box.  I ran "pip install numpy".  It took 1.5 minutes.  It printed 
almost 1800 lines of build crap, including 383 warnings and 83 errors.  
For a newbie, that can be pretty intimidating.

That's for the case where I've already installed numpy elsewhere on that 
box, so I already had the fortran compiler, and the rest of the build 
chain.  For fun, I just spun up a new Ubuntu Precise instance in AWS.  
It came pre-installed with Python 2.7.3.  I tried "pip install numpy", 
which told me that pip was not installed.

At least it told me what I needed to do to get pip installed.  
Unfortunately, I didn't read the message carefully enough and typed 
"sudo apt-get install pip", which of course got me another error because 
the correct name of the package is python-pip.  Doing "sudo apt-get 
install python-pip" finally got me to the point where I could start to 
install numpy.

Of course, if I didn't have sudo privs on the box (most corporate 
environments), I never would have gotten that far.

At this point, "sudo pip install numpy" got me a bunch of errors 
culminating in "RuntimeError: Broken toolchain: cannot link a simple C 
program", and no indication of how to get any further.

At this point, most people would give up.  I don't remember the full set 
of steps I needed to do the first time.  Obviously, I would start with 
installing gcc, but I seem to remember there were additional steps 
needed to get fortran support.

Having some simple statistics baked into the standard python package 
would be a big win.  As shown above, installing numpy can be an 
insurmountable hurdle for people with insufficient sysadmin-fu.

PEP-450 makes cogent arguments why rolling your own statistics routines 
is fraught with peril.  Looking over our source tree, I see we've 
implemented std deviation in python at least twice. I'm sure they're 
both naive implementations of the sort PEP-450 warns about.

And, yes, backporting to 2.7 would be a big win too.  I know the goal is 
to get everybody onto 3.x, but my pip external dependency list includes 
40 modules.  It's going to be a long and complicated road to get to the 
point where I can move to 3.x, and I imagine most non-trivial projects 
are in a similar situation.



More information about the Python-list mailing list