[Numpy-discussion] Numpy-discussion Digest, Vol 19, Issue 44

Joe Harrington jh at physics.ucf.edu
Thu Apr 10 02:21:05 EDT 2008


> Absolutely.  Let's please standardize on:
> import numpy as np
> import scipy as sp

I hope we do NOT standardize on these abbreviations.  While a few may
have discussed it at a sprint, it hasn't seen broad discussion and
there are reasons to prefer the other practice (numpy as N, scipy as
S, pylab as P).  My reasons for saying this go back to my reasons for
disliking lots of heirarchical namespaces at all: if we must have
namespaces, let's minimize the visual and typing impact by making them
short and visually distinct from the function names (by capitalizing
them).

What concerns me about the discussion is that we are still not
thinking like communications and thought-process experts, we are
thinking like categorizers and accountants.  The arguments we are
raising don't have to do, positively or negatively, with the difficult
acts of communicating with a computer and with other readers of our
code.  Those are the sole purposes of computer languages.

Namespaces add characters to code that have a high redundancy factor.
This means they pollute code, make it slow and inaccurate to read, and
making learning harder.  Lines get longer and may wrap if they contain
several calls.  It is harder while visually scanning code to
distinguish the function name if it's adjacent to a bunch of other
text, particularly if that text appears commonly in the nearby code.
It therefore becomes harder to spot bugs.  Mathematical code becomes
less and less like the math expressions we write on paper when doing
derivations, making it harder to interpret and verify.  You have to
memorize which subpackage each function is in, which is hard to do for
those functions that could naturally go in two subpackages.  While
many math function names are obvious, subpackage names are not.  Is it
.stat or .stats or .statistics?  .rand or .random?  .fin or
.financial?  Some functions have this problem, but *every* namespace
name has it in spades.

The arguments people are raising are arguments related to how
emotionally satisfying it is to have a place for everything and
everything in its place, and to know you know everything there is to
know.  While we like both those things, as scientists, engineers, and
mathematicians, they are almost irrelevant to coding.  There is simply
no reduction in readability, writeability, or debugability if you
don't have namespace prefixes on everything, and knowing you know
everything is easily accomplished now with the online categorized
function list.  We can incorporate that functionality into the doc
reading apparatus ("help", currently) by using keywords in ReST
comments in the docstrings and providing a way for "help" and its
friends to list the keywords and what functions are connected to them.

What nobody has said is "if we have lots of namespaces, my code will
look prettier" or "if we have lots of namespaces, normal people will
learn faster" or "if we have lots of namespaces, my code will be
easier to verify and debug".  I don't believe any of these statements
to be true.  Do you?

Similarly, nobody has said, "if we have lots of namespaces, I'll be a
faster coder".  There is a *very* high obnoxiousness factor in typing
redundant stuff at an interpreter.  It's already annoying to type
N.sin instead of sin, but N.T.sin?  Or worse, np.tg.sin?  Now the
prefix has twice the characters of the function itself!  Most IDL
users *hate* that you have to type "print, " in order to inspect the
contents of a variable.  Yet, with multiple layers of namespaces we'd
have lots more than seven extra characters on most lines of code, and
unlike the IDL mess you'd have to *think* to recall what the right
extra characters were for each function call, unlike just telling your
hands to run the "print, " finger macro once again.

The reasons we all like Python relate to how quick and easy it is to
emit code from our fingertips that is similar to what we are thinking
in our brains, compared to other languages.  The brain doesn't declare
variables, nor run loops over arrays.  Neither does Python.  When we
average up the rows of a 2D array and subtract that average from the
image, we don't first imagine making a new 2D array by repeating the
averaged row, and neither does Python, it just broadcasts behind the
scenes.  I could go on, and so could all of you.  Python feels more
like thought than other languages.

But now we are talking about breaking this commitment to lightness of
code text, learnability, readability, and debugability by adding layer
upon layer of prefixes to all the functions we write.

There is a vital place for namespaces.  Using import *, or not having
namespaces at all, has unpredictable consequences, especially in the
future when someone may add a function with a name identical to one
you are using to one of the packages you import, breaking existing
code.  Namespaces make it possible for two developers who are not in
communication to produce different packages that contain the same
names, and not worry.  This is critical in open source, so we live
with it or we go back to declaring our functions, as in C.  We can
reduce the impact by sticking with short, distinctive abbreviations
(capital N rather than lowercase np) and by not going heirarchical.
Where we need multiple packages, we should have them at the top level,
and not heirarchical.  I'll go so far as to suggest that if scipy must
have multiple packages within it, we could have them each be their own
top-level package, and drop the "scipy." (or "S.", or "sp.") prefix
entirely.  They can still be tested as a unit and released together if
we want that.  There is no problem with doing it this way that good
documentation does not fix.  I'd rather flatten scipy, however,
because the main reason to have namespaces is still satisfied that
way.  Of course, we should break the docs down as it's currently
packaged, for easier learning and management.  We just don't have to
instantiate that into the language itself.

What worries me is that the EXPERIENCE of reading and writing code in
Python is not much being raised in this discussion, when it should be
the *key* topic of any argument about the direction of the language.
So, in closing, I'd like to exhort everyone to try harder to think
like a sociologist, psychologist, and linguist in addition to thinking
like a computer scientist, physicist, or mathematician.  A computer
language is a means for communicating with a computer, and with others
who may use the code later.  We use languages like Python over the
much-faster assembly for a single reason: We spend too much time
coding, and it is faster and more accurate for the author and reader
to produce and consume code in Python than in assembly - or any other
language.

Let our guiding principle be to make this ever more true.

--jh--



More information about the NumPy-Discussion mailing list