[Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

Joshua Holbrook josh.holbrook at gmail.com
Thu Jul 8 10:54:58 EDT 2010


On Thu, Jul 8, 2010 at 3:13 AM, Lluís <xscript at gmx.net> wrote:
> Rob Speer writes:
>
>>>>> arr.country.named('Netherlands').year.named(2010)
>>>>> arr.country.named('Spain').year.named(slice(1994, 2010))
>>>>> arr.year.named(2006).country[0:2]
>
> This looks too verbose to me.
>
> As axis always have a total order, I'd go for the most compact representation
> (assuming 'country' is the first axis, and 'year' the second one):
>
>   arr['Netherlands','2010']
>   arr['Spain','1994':'2010']
>   arr[0:2,'2006']
>
> This is my current implementation, which also allows for slices with mixed
> integers and names everywhere.
>
> I understand this might not be the desired default behaviour, as requires
> looking into the types of every item in '__getitem__', and this might be a
> performance issue (although my current implementation tries to optimize for the
> case of integer indexes).
>
> Thus, we can use something in the middle:
>
>   arr[0,1]
>   arr.names['Netherlands',2010] # I'd rather go for 'names' instead of 'ticks'
>   arr.country['Spain'].year[1994:2010]
>
> The default '__getitem__' still has full speed, but accessing the 'named'
> attribute allows for accessing on the lines of my previous example, while still
> allowing the access through axis name without requiring an explicit 'slice'.
>
> Although this is not my preferred syntax, I think it is a good compromise, and I
> could always subclass this to redirect the default '__getitem__' into
> 'names.__getitem__'.
>
> Btw, I store the names to index translations on an ordered dict (indexed by
> name), such that I can also provide an 'arr.iteritems' method that returns
> tuples with 'name/tick' and the array contents of that index. In the above
> syntax, this would probably be 'arr.<axisname>.iteritems'.
>
> Another feature I like is being able to translate back and forth from
> names/ticks to integers, which I do through my 'Dimension.__getitem__' method
> (Dimension is the equivalent of datarray's 'Axis').
>
> PS: I also have a separation between axis and their naming, meaning that I can
> have a single axis with both 'country' and 'year', such that I would index with
> 'Netherlands-2010' (other examples do make more sense), but still be able to
> access them separately (this reduces the size of the full ndarray, as there is
> no need for so many NaNs to make the ndarray homoheneus on size, and it brings
> the ndarray closer to the structuring of data on the mind of the user).
>
> Read you,
>     Lluis
>
> --
>  "And it's much the same thing with knowledge, for whenever you learn
>  something new, the whole world becomes that much richer."
>  -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
>  Tollbooth
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

>  arr['Netherlands','2010']

Isn't this the __getitem___ action we were trying to avoid?

--Josh



More information about the NumPy-Discussion mailing list