[Numpy-discussion] Setting custom dtypes and 1.14

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Jan 29 14:10:56 EST 2018


On Mon, Jan 29, 2018 at 1:22 PM, Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> I think that there's a lot of confusion going around about recarrays vs
> structured arrays.
>
> [`recarray`](https://github.com/numpy/numpy/blob/v1.13.0/
> numpy/core/records.py) are a wrapper around structured arrays that
> provide:
> * Attribute access to fields as `arr.field` in addition to the normal
> `arr['field']`
> * Automatic datatype-guessing for nested lists of tuples (which needs a
> little work, but seems like a justifiable feature)
> * An undocumented `field` method that behaves like the 1.14 indexing
> behavior (!)
>
> Meanwhile, `recfunctions` is a collection of functions that work on normal
> structured arrays - so is misleadingly named.
> The only link to recarrays is that most of the functions have a
> `asrecarray` parameter which applies `.view(recarray)` to the result.
>
> > deprecate recarrays
>
> Given how thin an abstraction they are over structured arrays, I don't
> think you mean this.
> Are you advocating for deprecating structured arrays entirely, or just
> deprecating recfunctions?
>

First, statsmodels is in the pandas camp for dataframes, so I don't have
any invested interest in recarrays/structured dtypes anymore.

What I meant was that structured dtypes with implicit (hidden?) padding
becomes unintuitive for the recarray/dataframe usecase. (At least I won't
try to update my intuition about having extra things in there that are not
specified by the main structured dtype.) Also the dataframe_like usage of
structured dtypes doesn't seem to be much under consideration anymore.

So, my **impression** is that the recent changes make the
recarray/dataframe usecase for structured dtypes more difficult.

Given that there is pandas, xarray, dask and more, numpy could as well drop
any pretense of supporting dataframe_likes. Or, adjust the recfunctions so
we can still work dataframe_like with structured
dtypes/recarrays/recfunctions.


Josef



>
> Eric
>
> On Mon, 29 Jan 2018 at 09:39 Chris Barker <chris.barker at noaa.gov> wrote:
>
>> On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane <allanhaldane at gmail.com>
>> wrote:
>>
>>> On 01/26/2018 06:01 PM, josef.pktd at gmail.com wrote:
>>>
>>>>     I thought recarrays were pretty cool back in the day, but pandas is
>>>>     a much better option.
>>>>
>>>>     So I pretty much only use structured arrays for data exchange with C
>>>>     code....
>>>>
>>>> My impression is that this turns into a deprecate recarrays and
>>>> supporting recfunction issue.
>>>>
>>>>
>>
>>> *should* we have any dataframe-like functionality in numpy?
>>
>>
>>>
>>> We get requests every once in a while about how to sort rows, or about
>>> adding a "groupby" function. I myself have used recarrays in a
>>> dataframe-like way, when I wanted a quick multiple-array object that
>>> supported numpy indexing. So there is some demand to have minimal
>>> "dataframe-like" behavior in numpy itself.
>>>
>>> recarrays play part of this role currently, though imperfectly due to
>>> padding and cache issues. I think I'm comfortable with supporting some
>>> minor use of structured/recarrays as dataframe-like, with a warning in docs
>>> that the user should really look at pandas/xarray, and that structured
>>> arrays are primarily for data exchange.
>>>
>>
>> Well, I think we should either:
>>
>> deprecate recarrays -- i.e. explicitly not support DataFrame-like
>> functionality in numpy, keeping only the data-exchange functionality as
>> maintained.
>>
>> or
>>
>> Properly support it -- which doesn't mean re-implementing Pandas or
>> xarray, but it would mean addressing any bug-like issues like not dealing
>> properly with padding.
>>
>> Personally, I don't need/want it enough to contribute, but if someone
>> does, great.
>>
>> This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST
>> there, but not quite, with issues, and there was essentially no overlap
>> between the people that wanted it and the people that had the time and
>> skills to really make it work.
>>
>> (If we want to dream, maybe one day we should make a minimal
>>> multiple-array container class. I imagine it would look pretty similar to
>>> recarray, but stored as a set of arrays instead of a structured array. But
>>> maybe recarrays are good enough, and let's not reimplement pandas either.)
>>>
>>
>> Exactly -- we really don't need to re-implement Pandas....
>>
>> (except it's CSV reading capability :-) )
>>
>> -CHB
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>>
>> Chris.Barker at noaa.gov
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180129/1dc2e2dd/attachment.html>


More information about the NumPy-Discussion mailing list