[Numpy-discussion] Proposed record array behavior: the rest of the story: updated

Wed Jul 28 07:00:11 EDT 2004

On Wed, 28 Jul 2004 12:00:40 +0200
Francesc Alted <falted at pytables.org> wrote:

> A Dimarts 27 Juliol 2004 22:04, gerard.vermeulen at grenoble.cnrs.fr va escriure:
> > Introducing recordArray["column"] as an alternative for
> > recordArray.field("column") breaks a symmetry between for instance 1-d
> > record arrays and 2-d normal arrays. (the symmetry is strongly suggested
> > by their representation: a record array prints almost as a list of tuples
> > and a 2-d normal array almost as a list of lists).
> > 
> > Indexing a column of a 2-d normal array is done by normalArray[:, column],
> > so why not recArray[:, "column"] ?
> 
> Well, I must recognize that this has its beauty (by revealing the simmetry
> that you mentioned). However, mixing integer and strings on indices can
> be, in my opinion, rather confusing for most people. Then, I guess that
> the implementation wouldn't be easy.
> 
> > I prefer to use
> > 
> > recordArray.column[32]
> > 
> > and/or
> > 
> > recordArray[32].column
> > 
> > rather than recordArray["column"][32].
> 
> I would prefer better:
> 
> recordArray.fields.column[32]
> 
> or
> 
> recordArray.cols.column[32]
> 
> (note the use of the plural in fields and cols, which I think is more
> consistent about its functionality)
> 
> The problem with:
> 
> recordArray[32].fields.column
> 
> is that I don't see it as natural and besides, completion capabilities
> would be broken after the [] parenthesis.
>
Two points:

1. This is true for vanilla Python but not for IPython-0.6.2:

packer at zombie:~> ipython
Python 2.3+ (#1, Jan  7 2004, 09:17:35)
Type "copyright", "credits" or "license" for more information.

IPython 0.6.2 -- An enhanced Interactive Python.
?       -> Introduction to IPython's features.
@magic  -> Information about IPython's 'magic' @ functions.
help    -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.

In [1]: d = {'Francesc': 0}

In [2]: d['Francesc'].__a
d['Francesc'].__abs__  d['Francesc'].__add__  d['Francesc'].__and__

In [2]: d['Francesc'].__a

   You see, the completion mechanism of ipython recognizes d['Francesc'] as an
   integer.

2. If one accepts that a "field_name" can be used as an attribute, one must be
   able to say:

   record.field_name ( == record.field("field_name") )

   and (since recordArray[32] returns a record) also:

   recordArray[32].field_name

   and not

   recordArray[32].cols.field_name (sorry, I abhor this)

> 
> Anyway, as Russell suggested, I don't like recordArray["column"][32],
> because it would be unnecessary (you can get same result using
> recordArray[column_idx][32]).
>

Thank you for this little slip, you mean recordArray["column"][32] is
recordArray[32][column_idx], isn't it?

> 
> Although I recognize that a recordArray.cols["column"][32] would not hurt
> my eyes so much. This is because although indices continues to mix ints
> and strings, the difference is that ".cols" is placed first, giving a new
> (and unmistakable) meaning to the "column" index. 
> 

I am just worried that future generalization of indexing will be impossible
if the meaning of an indexing operation ("get row" or "get column or field")
depends on the fact that an index is a string or an integer: IMO the meaning
should depend on the position in the index list.

The example has been choosen to show that I don't mind indexing by strings at
all. If I see array[13, 'ab', 31, 'ba'], I know that 'ab' and 'ba' index record
fields as long as the indices are in 'normal' order.

Nevertheless, I am aware that Utopia may be hard to implement efficiently, but
this reflects my mental picture of nested (record) arrays.

(ipython in Utopia would me allow to figure out array[13].ab[31].ba by tab
 completion and I would translate this to array[13, 'ab', 31, 'ba'] for
 efficiency in a real program)

I think that we agree that recordArray.cols["column"] is better than
recordArray["column"], but I don't see why recordArray.cols["column"] is
better than the original recordArray.field("column").

Cheers -- Gerard

PS: after reading the above, there may be a case to accept only indexing
    which can be read from left to right, so
    recordArray[32].field_name is OK, but recordArray.field_name[32] is not.