[Numpy-discussion] Tabular data package

Mon Oct 5 18:58:34 EDT 2009

On Mon, Oct 5, 2009 at 17:52, Elaine Angelino <elaine.angelino at gmail.com> wrote:
> On Mon, Oct 5, 2009 at 6:36 PM, Robert Kern <robert.kern at gmail.com> wrote:
>
>> > the main reason we went with the recarray over the ndarray is because
>> > the
>> > recarray has a couple of useful construction functions (e.g.
>> > np.rec.fromrecords and np.rec.fromarrays).  not only are these functions
>> > convenient to use, they have nice data type inference properties which
>> > we'd
>> > have to rebuild ourselves if we wanted to avoid recarrays entirely.
>>
>> Try np.rec.fromrecords(...).view(np.ndarray).
>>
>
> Hi Robert, thanks your email.  We definitely understand this use of
> .view().  However,  our question is,  should we have implemented tabular
> this way, e.g. in the tabarray constructor, first make a recarray and then
> view it as an ndarray?  (and then of course view it as a tabarray).

Do the minimum number of .view()s that you can get away with.

> This
> would have the effect of eliminating the extra recarray functionality, and
> some if its overhead as well. Is this the desirable design, or should we
> stick with recarrays?

Well, what other recarray functionality are you using? I addressed the
from*() functions because you said it was the main reason. What are
your other reasons?

> (Also, is first casting to recarrays and then viewing as ndarrays more
> expensive than if we went through ndarray directly?)

The overhead should be miniscule. No data is converted.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco