[Numpy-discussion] RE: default axis for numarray
Scott Ransom
ransom at physics.mcgill.ca
Mon Jun 10 18:56:03 EDT 2002
I have to admit that I agree with all of what Eric has to say
here -- even if it does cause some code breakage (I'm certainly
willing to do some maintenance on my code/modules that are
floating here and there so long as things continue to improve
with the language as a whole).
I do think consistency is a very important aspect of getting
Numeric/Numarray accepted by a larger user base (and believe
me, my colaborators are probably sick of my Numeric Python
evangelism (but I like to think also a bit jealous of my NumPy
usage as they continue struggling with one-off C and Fortran
routines...)).
Another example of a glaring inconsistency in the current
implementation is this little number that has been bugging me
for awhile:
>>> arange(10, typecode='d')
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
>>> ones(10, typecode='d')
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
>>> zeros(10, typecode='d')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: an integer is required
>>> zeros(10, 'd')
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Anyway, these little warts that we are discussing probably
haven't kept my astronomer friends from switching from IDL, but
as things progress and well-known astronomical or other
scientific software packages are released based on Python (like
pyraf) from well-known groups (like STScI/NASA), they will
certainly take a closer look.
On a slightly different note, my hearty thanks to all the
developers for all of your hard work so far.
Numeric/Numarray+Python is a fantastic platform for scientific
computation.
Cheers,
Scott
On Mon, Jun 10, 2002 at 06:15:25PM -0500, eric jones wrote:
> So one contentious issue a day isn't enough, huh? :-)
>
> > An issue that has been raised by scipy (most notably Eric Jones
> > and Travis Oliphant) has been whether the default axis used by
> > various functions should be changed from the current Numeric
> > default. This message is not directed at determining whether we
> > should change the current Numeric behavior for Numeric, but whether
> > numarray should adopt the same behavior as the current Numeric.
> >
> > To be more specific, certain functions and methods, such as
> > add.reduce(), operate by default on the first axis. For example,
> > if x is a 2 x 10 array, then add.reduce(x) results in a
> > 10 element array, where elements in the first dimension has
> > been summed over rather than the most rapidly varying dimension.
> >
> > >>> x = arange(20)
> > >>> x.shape = (2,10)
> > >>> x
> > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
> > [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
> > >>> add.reduce(x)
> > array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])
>
> The issue here is both consistency across a library and speed.
>
> >From the numpy.pdf, Numeric looks to have about 16 functions using
> axis=0 (or index=0 which should really be axis=0) and, counting FFT,
> about 10 functions using axis=-1. To this day, I can't remember which
> functions use which and have resorted to explicitly using axis=-1 in my
> code. Unfortunately, many of the Numeric functions that should still
> don't take axis as a keyword, so you and up just inserting -1 in the
> argument list (but this is a different issue -- it just needs to be
> fixed).
>
> SciPy always uses axis=-1 for operations. There are 60+ functions with
> this convention. Choosing -1 offers the best cache use and therefore
> should be more efficient. Defaulting to the fastest behavior is
> convenient because new users don't need any special knowledge of
> Numeric's implementation to get near peak performance. Also, there is
> never a question about which axis is used for calculations.
>
> When using SciPy and Numeric, their function sets are completely
> co-mingled. When adding SciPy and Numeric's function counts together,
> it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a
> standard, it is impossible for the interface to become intuitive because
> of the exceptions to the rule from Numeric.
>
> So here what I think. All functions should default to the same axis so
> that the interface to common functions can become second nature for new
> users and experts alike. Further, the chosen axis should be the most
> efficient for the most cases.
>
> There are actually a few functions that, taken in isolation, I think
> should have axis=0. take() is an example. But, for the sake of
> consistency, it too should use axis=-1.
>
> It has been suggested to recommend that new users always specify axis=?
> as a keyword in functions that require an axis argument. This might be
> fine when writing modules, but always having to type:
>
> >>> sum(a,axis=-1)
>
> in command line mode is a real pain.
>
> Just a point about the larger picture here... The changes we're
> discussing are intended to clean up the warts on Numeric -- and, as good
> as it is overall, these are warts in terms of usability. Interfaces
> should be consistent across a library. The return types from functions
> should be consistent regardless of input type (or shape). Default
> arguments to the same keyword should also be consistent across
> functions. Some issues are left to debate (i.e. using axis=-1 or axis=0
> as default, returning arrays or scalars from Numeric functions and
> indexing), but the choice made should be applied as consistently as
> possible.
>
> We should also strive to make it as easy as possible to write generic
> functions that work for all array types (Int, Float,Float32,Complex,
> etc.) -- yet another debate to come.
>
> Changes are going to create some backward incompatibilities and that is
> definitely a bummer. But some changes are also necessary before the
> community gets big. I know the community is already reasonable size,
> but I also believe, based on the strength of Python, Numeric, and
> libraries such as Scientific and SciPy, the community can grow by 2
> orders of magnitude over the next five years. This kind of growth can't
> occur if only savvy developers see the benefits of the elegant language.
> It can only occur if the general scientist see Python as a compelling
> alternative to Matlab (and IDL) as their day-in/day-out command line
> environment for scientific/engineering analysis. Making the interface
> consistent is one of several steps to making Python more attractive to
> this community.
>
> Whether the changes made for numarray should be migrated back into
> Numeric is an open question. I think they should, but see Konrad's
> counterpoint. I'm willing for SciPy to be the intermediate step in the
> migration between the two, but also think that is sub-optimal.
>
> >
> > Some feel that is contrary to expectations that the least rapidly
> > varying dimension should be operated on by default. There are
> > good arguments for both sides. For example, Konrad Hinsen has
> > argued that the current behavior is most compatible for behavior
> > of other Python sequences. For example,
> >
> > >>> sum = 0
> > >>> for subarr in x:
> > sum += subarr
> >
> > acts on the first axis in effect. Likewise
> >
> > >>> reduce(add, x)
> >
> > does likewise. In this sense, Numeric is currently more consistent
> > with Python behavior. However, there are other functions that
> > operate on the most rapidly varying dimension. Unfortunately
> > I cannot currently access my old mail, but I think the rule
> > that was proposed under this argument was that if the 'reduction'
> > operation was of a structural kind, the first dimension is used.
> > If the reduction or processing step is 'time-series' oriented
> > (e.g., FFT, convolve) then the last dimension is the default.
> > On the other hand, some feel it would be much simpler to understand
> > if the last axis was the default always.
> >
> > The question is whether there is a consensus for one approach or
> > the other. We raised this issue at a scientific Birds-of-a-Feather
> > session at the last Python Conference. The sense I got there was
> > that most were for the status quo, keeping the behavior as it is
> > now. Is the same true here? In the absence of consensus or a
> > convincing majority, we will keep the behavior the same for backward
> > compatibility purposes.
>
> Obviously, I'm more opinionated about this now than I was then. I
> really urge you to consider using axis=-1 everywhere. SciPy is not the
> only scientific library, but I think it adds the most functions with a
> similar signature (the stats module is full of them). I very much hope
> for a consistent interface across all of Python's scientific functions
> because command line users aren't going to care whether sum() and
> kurtosis() come from different libraries, they just want them to behave
> consistently.
>
> eric
>
> >
> > Perry
>
>
>
> _______________________________________________________________
>
> Don't miss the 2002 Sprint PCS Application Developer's Conference
> August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
--
--
Scott M. Ransom Address: McGill Univ. Physics Dept.
Phone: (514) 398-6492 3600 University St., Rm 338
email: ransom at physics.mcgill.ca Montreal, QC Canada H3A 2T8
GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
More information about the NumPy-Discussion
mailing list