[Numpy-discussion] Making NumPy accessible to everyone (or no-one) (was Numpy-discussion Digest, Vol 19, Issue 44)

Thu Apr 10 06:55:18 EDT 2008

Hi Joe, all

On 10/04/2008, Joe Harrington <jh at physics.ucf.edu> wrote:
> > Absolutely.  Let's please standardize on:
>  > import numpy as np
>  > import scipy as sp
>
>  I hope we do NOT standardize on these abbreviations.  While a few may
>  have discussed it at a sprint, it hasn't seen broad discussion and
>  there are reasons to prefer the other practice (numpy as N, scipy as
>  S, pylab as P).

"N" is a very unfortunate choice of abbreviation, given that so many
algorithms use it to indicate the number of elements in things.  "np"
is much safer and, like Jarrod mentioned, also only takes two keys to
type.  Sebastian, a simple regexp replace should fix your problem
(investment in hundreds of lines of N.* usage).

>  My reasons for saying this go back to my reasons for
>  disliking lots of heirarchical namespaces at all: if we must have
>  namespaces, let's minimize the visual and typing impact by making them
>  short and visually distinct from the function names (by capitalizing
>  them).

The Python Style Guide (PEP08) recommends that we stick to lowercase,
underscore-separated names.  We'd do our users a real disservice by
not following community defined standards.

Namespaces throttle the amount of information with which the user is
presented, and well thought through design leads to logical, intuitive
segmentation of functionality.

Searchable documentation and indices then become essential in guiding
the user to the right place.  On the other hand, when I was a
freshman, we had a course on MATLAB; I remember spending countless
hours using "lookfor" (I think that's what it is called?).  That was
one of the effects of a flat namespace.

For interactive work, a flat namespace may be ideal (and I have no
problem with us providing that as well), but otherwise, for file based
code, I'd much prefer to have a (relatively shallow) namespace
structure.

>  What concerns me about the discussion is that we are still not
>  thinking like communications and thought-process experts, we are
>  thinking like categorizers and accountants.  The arguments we are
>  raising don't have to do, positively or negatively, with the difficult
>  acts of communicating with a computer and with other readers of our
>  code.  Those are the sole purposes of computer languages.

Isn't it easier to explain how to use a well-structured, organised
library, rather than some functions-all-over-the-floor mess?  If an
accountant can import numpy.finance and do his work, how is that more
difficult than importing every possible function included, and then
sifting through them?

>  Namespaces add characters to code that have a high redundancy factor.
>  This means they pollute code, make it slow and inaccurate to read, and
>  making learning harder.  Lines get longer and may wrap if they contain
>  several calls.  It is harder while visually scanning code to
>  distinguish the function name if it's adjacent to a bunch of other
>  text, particularly if that text appears commonly in the nearby code.

Python provides very good machinery with dealing with this verbosity:

import very_foo_module as f
from very.deeply.nested.namespace import func

and even

def foo(args):
    c = commonly_used_func
    result = c(3) + c(4) + 2*c(5)

At the moment, everyone warns against using '*' with numpy, but with
proper namespace, the * can be quite handy:

from numpy.math import *

a = sin(theta) + 3*cos(theta**2)

(the example above already works in current numpy)

>  It therefore becomes harder to spot bugs.  Mathematical code becomes
>  less and less like the math expressions we write on paper when doing
>  derivations, making it harder to interpret and verify.  You have to
>  memorize which subpackage each function is in, which is hard to do for
>  those functions that could naturally go in two subpackages.

If you have need to use a subset of functions defined over different
namespaces, it is very easy to create a custom module, say
my_field_of_study.py:

from numpy.math import cosh, sinh
from numpy.linalg import inv

etc.

Then, a simple "from my_field_of_study import *" provides you with
everything you need.  This needs to be done once in your life, and can
be advocated as a Cookbook recipe.  Memorisation be gone (but who
needs to memorise with TAB-completion anyway).

>  While
>  many math function names are obvious, subpackage names are not.  Is it
>  .stat or .stats or .statistics?  .rand or .random?  .fin or
>  .financial?  Some functions have this problem, but *every* namespace
>  name has it in spades.

Introspection is such a joy with IPython, or with the SAGE notebook,
and many editors even provide similar functionality.  Stuffing
domain-specific functions into a flat namespace sounds like the ideal
way of confusion a new user.

> There is simply
>  no reduction in readability, writeability, or debugability if you
>  don't have namespace prefixes on everything, and knowing you know
>  everything is easily accomplished now with the online categorized
>  function list.

You're right, we read code much more often than we write it.  Even so,
we seldom read thousands of lines of code, and often focus on a narrow
part of the module, where the namespace provides context about the
origins of the function being used. The penalty on writability isn't
severe, in my experience.  As for debugging, I'm reminded of the speed
issues with numpy.sum and python's math.sum.

>  We can incorporate that functionality into the doc
>  reading apparatus ("help", currently) by using keywords in ReST
>  comments in the docstrings and providing a way for "help" and its
>  friends to list the keywords and what functions are connected to them.

Good idea.

>  What nobody has said is "if we have lots of namespaces, my code will
>  look prettier" or "if we have lots of namespaces, normal people will
>  learn faster" or "if we have lots of namespaces, my code will be
>  easier to verify and debug".  I don't believe any of these statements
>  to be true.  Do you?

Given that "lots" mean "the necessary amount", I'd answer "yes" to 1,
2 and 3. 1) More organised, 2) therefore easier to fit into my mind
and 3) therefore easier to verify and debug.

>  Similarly, nobody has said, "if we have lots of namespaces, I'll be a
>  faster coder".

We're not pushing for lots of namespaces, but for a fitting number
(probably very few in the case of numpy).  The impact on coding speed
should be minimal.

>  There is a *very* high obnoxiousness factor in typing
>  redundant stuff at an interpreter.

For the interpreted I'd suggest a different structure (i.e. an IPython
profile, importing my_field_of_study.py :).

>  users *hate* that you have to type "print, " in order to inspect the
>  contents of a variable.

Again, IPython -- we have the right tool for the job.

>  The reasons we all like Python relate to how quick and easy it is to
>  emit code from our fingertips that is similar to what we are thinking
>  in our brains, compared to other languages.

And your brain has a very small cache, so use it with care.

>  documentation does not fix.  I'd rather flatten scipy, however,
>  because the main reason to have namespaces is still satisfied that
>  way.

I shudder to think of a flattened SciPy.

>  What worries me is that the EXPERIENCE of reading and writing code in
>  Python is not much being raised in this discussion, when it should be
>  the *key* topic of any argument about the direction of the language.

Agreed.  Fortunately, we already have many talented developers working
on tools to make that part easier.

>  So, in closing, I'd like to exhort everyone to try harder to think
>  like a sociologist, psychologist, and linguist in addition to thinking
>  like a computer scientist, physicist, or mathematician.

You're asking a lot -- and I don't know if that is fair.  Surely, we
should expect users of numpy to think a little bit like computer
scientists, physicists, engineers and methematicians as well?

Regards
Stéfan