[Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

Joshua Holbrook josh.holbrook at gmail.com
Thu Jul 8 03:07:24 EDT 2010


On Wed, Jul 7, 2010 at 10:25 PM, Rob Speer <rspeer at mit.edu> wrote:
> Glad I finally found this discussion.
>
> I implemented some of the ideas from the SciPy BOAF discussion, and
> Joshua has already merged them into his datarray on GitHub (thanks,
> Joshua, for being so fast on the merge button).
>
> To introduce these changes, here's a couple of examples of how you
> could index into a matrix whose rows represent countries, and whose
> columns represent something that is observed every four years
> (hmm...).
>>>> arr.country.named('Netherlands').year.named(2010)
>>>> arr.country.named('Spain').year.named(slice(1994, 2010))
>>>> arr.year.named(2006).country[0:2]
>
> First of all, a bit of terminology. Axes can have labels. Ticks (which
> are particular rows, columns, etc.) can have names. Axes and ticks
> also have indices (the sequential numbers they've always had). Feel
> free to suggest alternate terminology, I just used what sounded the
> most natural to me in the method names.
>
> Addressing by indices and addressing by tick names are separate, which
> allows integers to be tick names without a conflict. You use the
> "named" method of an axis to address it by name, while __getitem__
> only addresses it by indices. You can still take slices of names
> (makes sense for things like years), but you have to spell out "slice"
> because it's not inside square brackets.
>
> Then, at the axis level: My impression from the SciPy discussion was
> that people wanted to be able to look up multiple labeled axes at once
> without repeating themselves, and .aix and stuples were not
> satisfying, but we didn't come up with anything else during the
> discussion.
>
> My choice was to add a bit of attribute magic: if you get an attribute
> of a datarray that is (a) not a real attribute and (b) matches the
> label of one of its axes, you'll get that axis. So "arr.axis.country"
> can be shortened to "arr.country", for example, but if you decided to
> name your axis "T", you would be stuck with "arr.axis.T".
>
> So this is the state of the code at http://github.com/rspeer/datarray
> (and also at http://github.com/jesusabdullah/datarray now). I'll even
> try to make the documentation catch up with this code if people think
> the changes are good.
> -- Rob
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

While I haven't had a chance to really look in-depth at the changes
myself (I'm a busy man! So many mailing lists!), I so far like the
look and sound of them. That's just my opinion, though.

While on the subject of docs: The current sphinx docs look like they
got a bit jumbled somewhere along the way. I don't really know my
sphinxes (or restructuredtexts) yet, but these docs are definitely
something I'd like to get in-order.

--Josh
--Josh



More information about the NumPy-Discussion mailing list