[Numpy-discussion] RE: default axis for numarray

Mon Jun 10 18:56:03 EDT 2002

I have to admit that I agree with all of what Eric has to say
here -- even if it does cause some code breakage (I'm certainly
willing to do some maintenance on my code/modules that are
floating here and there so long as things continue to improve
with the language as a whole).

I do think consistency is a very important aspect of getting
Numeric/Numarray accepted by a larger user base (and believe
me, my colaborators are probably sick of my Numeric Python
evangelism (but I like to think also a bit jealous of my NumPy
usage as they continue struggling with one-off C and Fortran
routines...)).

Another example of a glaring inconsistency in the current
implementation is this little number that has been bugging me
for awhile:

>>> arange(10, typecode='d')
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
>>> ones(10, typecode='d')
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
>>> zeros(10, typecode='d')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: an integer is required
>>> zeros(10, 'd')
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

Anyway, these little warts that we are discussing probably
haven't kept my astronomer friends from switching from IDL, but
as things progress and well-known astronomical or other
scientific software packages are released based on Python (like
pyraf) from well-known groups (like STScI/NASA), they will
certainly take a closer look.

On a slightly different note, my hearty thanks to all the
developers for all of your hard work so far.
Numeric/Numarray+Python is a fantastic platform for scientific
computation.

Cheers,

Scott

On Mon, Jun 10, 2002 at 06:15:25PM -0500, eric jones wrote:
> So one contentious issue a day isn't enough, huh? :-)
> 
> > An issue that has been raised by scipy (most notably Eric Jones
> > and Travis Oliphant) has been whether the default axis used by
> > various functions should be changed from the current Numeric
> > default. This message is not directed at determining whether we
> > should change the current Numeric behavior for Numeric, but whether
> > numarray should adopt the same behavior as the current Numeric.
> > 
> > To be more specific, certain functions and methods, such as
> > add.reduce(), operate by default on the first axis. For example,
> > if x is a 2 x 10 array, then add.reduce(x) results in a
> > 10 element array, where elements in the first dimension has
> > been summed over rather than the most rapidly varying  dimension.
> > 
> > >>> x = arange(20)
> > >>> x.shape = (2,10)
> > >>> x
> > array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
> >       [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
> > >>> add.reduce(x)
> > array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])
> 
> The issue here is both consistency across a library and speed.
> 
> >From the numpy.pdf, Numeric looks to have about 16 functions using
> axis=0 (or index=0 which should really be axis=0) and, counting FFT,
> about 10 functions using axis=-1.  To this day, I can't remember which
> functions use which and have resorted to explicitly using axis=-1 in my
> code.  Unfortunately, many of the Numeric functions that should still
> don't take axis as a keyword, so you and up just inserting -1 in the
> argument list (but this is a different issue -- it just needs to be
> fixed).
> 
> SciPy always uses axis=-1 for operations.  There are 60+ functions with
> this convention.  Choosing -1 offers the best cache use and therefore
> should be more efficient.  Defaulting to the fastest behavior is
> convenient because new users don't need any special knowledge of
> Numeric's implementation to get near peak performance.  Also, there is
> never a question about which axis is used for calculations.
> 
> When using SciPy and Numeric, their function sets are completely
> co-mingled.  When adding SciPy and Numeric's function counts together,
> it is 70 to 16 for axis=-1 vs. axis=0.  Even though SciPy chose a
> standard, it is impossible for the interface to become intuitive because
> of the exceptions to the rule from Numeric.  
> 
> So here what I think.  All functions should default to the same axis so
> that the interface to common functions can become second nature for new
> users and experts alike.  Further, the chosen axis should be the most
> efficient for the most cases.
> 
> There are actually a few functions that, taken in isolation, I think
> should have axis=0.  take() is an example.  But, for the sake of
> consistency, it too should use axis=-1.
> 
> It has been suggested to recommend that new users always specify axis=?
> as a keyword in functions that require an axis argument.  This might be
> fine when writing modules, but always having to type:
> 
> 	>>> sum(a,axis=-1)
> 
> in command line mode is a real pain.
> 
> Just a point about the larger picture here...  The changes we're
> discussing are intended to clean up the warts on Numeric -- and, as good
> as it is overall, these are warts in terms of usability.  Interfaces
> should be consistent across a library.  The return types from functions
> should be consistent regardless of input type (or shape). Default
> arguments to the same keyword should also be consistent across
> functions. Some issues are left to debate (i.e. using axis=-1 or axis=0
> as default, returning arrays or scalars from Numeric functions and
> indexing), but the choice made should be applied as consistently as
> possible.
> 
> We should also strive to make it as easy as possible to write generic
> functions that work for all array types (Int, Float,Float32,Complex,
> etc.) -- yet another debate to come.  
> 
> Changes are going to create some backward incompatibilities and that is
> definitely a bummer.  But some changes are also necessary before the
> community gets big.  I know the community is already reasonable size,
> but I also believe, based on the strength of Python, Numeric, and
> libraries such as Scientific and SciPy, the community can grow by 2
> orders of magnitude over the next five years.  This kind of growth can't
> occur if only savvy developers see the benefits of the elegant language.
> It can only occur if the general scientist see Python as a compelling
> alternative to Matlab (and IDL) as their day-in/day-out command line
> environment for scientific/engineering analysis.  Making the interface
> consistent is one of several steps to making Python more attractive to
> this community.
> 
> Whether the changes made for numarray should be migrated back into
> Numeric is an open question.  I think they should, but see Konrad's
> counterpoint.  I'm willing for SciPy to be the intermediate step in the
> migration between the two, but also think that is sub-optimal.
> 
> > 
> > Some feel that is contrary to expectations that the least rapidly
> > varying dimension should be operated on by default. There are
> > good arguments for both sides. For example, Konrad Hinsen has
> > argued that the current behavior is most compatible for behavior
> > of other Python sequences. For example,
> > 
> > >>> sum = 0
> > >>> for subarr in x:
> >         sum += subarr
> > 
> > acts on the first axis in effect. Likewise
> > 
> > >>> reduce(add, x)
> > 
> > does likewise. In this sense, Numeric is currently more consistent
> > with Python behavior. However, there are other functions that
> > operate on the most rapidly varying dimension. Unfortunately
> > I cannot currently access my old mail, but I think the rule
> > that was proposed under this argument was that if the 'reduction'
> > operation was of a structural kind, the first dimension is used.
> > If the reduction or processing step is 'time-series' oriented
> > (e.g., FFT, convolve) then the last dimension is the default.
> > On the other hand, some feel it would be much simpler to understand
> > if the last axis was the default always.
> > 
> > The question is whether there is a consensus for one approach or
> > the other. We raised this issue at a scientific Birds-of-a-Feather
> > session at the last Python Conference. The sense I got there was
> > that most were for the status quo, keeping the behavior as it is
> > now. Is the same true here? In the absence of consensus or a
> > convincing majority, we will keep the behavior the same for backward
> > compatibility purposes.
> 
> Obviously, I'm more opinionated about this now than I was then.  I
> really urge you to consider using axis=-1 everywhere.  SciPy is not the
> only scientific library, but I think it adds the most functions with a
> similar signature (the stats module is full of them).  I very much hope
> for a consistent interface across all of Python's scientific functions
> because command line users aren't going to care whether sum() and
> kurtosis() come from different libraries, they just want them to behave
> consistently.
> 
> eric 
> 
> > 
> > Perry
> 
> 
> 
> _______________________________________________________________
> 
> Don't miss the 2002 Sprint PCS Application Developer's Conference
> August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink
> 
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion

-- 
-- 
Scott M. Ransom              Address:  McGill Univ. Physics Dept.
Phone:  (514) 398-6492                 3600 University St., Rm 338
email:  ransom at physics.mcgill.ca       Montreal, QC  Canada H3A 2T8 
GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989