[Numpy-discussion] Datarray BoF, part2

Keith Goodman kwgoodman at gmail.com
Wed Jul 21 12:37:51 EDT 2010


About a dozen people attended what was billed as a continuation of the
SciPy 2010 datarray BoF. We met at UC Berkeley on July 19 as part of
the py4science series.

A datarray is a subclass of a Numpy array that adds the ability to
label the axes and to label the elements along each axis.

We spent most of the time discussing how to index with tick labels.
The main issue is with integers: is an integer index a tick name or a
position index?

At the top level, datarrays always use regular Numpy indexing: an int
is a position, never a label. So darr[0] always returns the first
element of the datarray.

The ambiguity occurs in specialized indexing methods that allow
indexing by tick label name (because the name could be an int). To
break the ambiguity, the proposal was to provide several tick indexing
methods[1]:

1. Integers are always labels
2. Integers are never treated as labels
3. Try 1, then 2

We also discussed allowing axis labels to be any hashable object
(currently only strings are allowed). The main problem: integers.
Currently if an axis is labeled, say, "time", you can do
darr.sum(axis="time"). What happens when an axis is labeled with an
int? What does the 2 in darr.sum(axis=2) refer to? A position or a
label? The same problem exists for floats since a float is (currently)
a valid axis for Numpy arrays.

References:
[1] http://github.com/fperez/datarray/commit/3c5151baa233675b355058eb3ba028d2629bece5



More information about the NumPy-Discussion mailing list