[Numpy-discussion] numpy 1.10.1 reduce operation on recarrays

Allan Haldane allanhaldane at gmail.com
Fri Oct 16 21:31:11 EDT 2015


On 10/16/2015 09:17 PM, josef.pktd at gmail.com wrote:
>
>
> On Fri, Oct 16, 2015 at 8:56 PM, Allan Haldane <allanhaldane at gmail.com
> <mailto:allanhaldane at gmail.com>> wrote:
>
>     On 10/16/2015 05:31 PM, josef.pktd at gmail.com
>     <mailto:josef.pktd at gmail.com> wrote:
>     >
>     >
>     > On Fri, Oct 16, 2015 at 2:21 PM, Charles R Harris
>     > <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>
>     <mailto:charlesr.harris at gmail.com
>     <mailto:charlesr.harris at gmail.com>>> wrote:
>     >
>     >
>     >
>     >     On Fri, Oct 16, 2015 at 12:20 PM, Charles R Harris
>     >     <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>
>     <mailto:charlesr.harris at gmail.com
>     <mailto:charlesr.harris at gmail.com>>> wrote:
>     >
>     >
>     >
>     >         On Fri, Oct 16, 2015 at 11:58 AM, <josef.pktd at gmail.com <mailto:josef.pktd at gmail.com>
>      >         <mailto:josef.pktd at gmail.com
>     <mailto:josef.pktd at gmail.com>>> wrote:
>      >
>      >             was there a change with reduce operations with
>     recarrays in
>      >             1.10 or 1.10.1?
>      >
>      >             Travis shows a new test failure in the statsmodels
>     testsuite
>      >             with 1.10.1:
>      >
>      >             ERROR: test suite for <class
>      >             'statsmodels.base.tests.test_data.TestRecarrays'>
>      >
>      >               File
>      >
>       "/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
>      >             line 131, in _handle_constant
>      >                 const_idx = np.where(self.exog.ptp(axis=0) ==
>      >             0)[0].squeeze()
>      >             TypeError: cannot perform reduce with flexible type
>      >
>      >
>      >             Sorry for asking so late.
>      >             (statsmodels is short on maintainers, and I'm distracted)
>      >
>      >
>      >             statsmodels still has code to support recarrays and
>      >             structured dtypes from the time before pandas became
>      >             popular, but I don't think anyone is using them together
>      >             with statsmodels anymore.
>      >
>      >
>      >         There were several commits dealing both recarrays and
>     ufuncs, so
>      >         this might well be a regression.
>      >
>      >
>      >     A bisection would be helpful. Also, open an issue.
>      >
>      >
>      >
>      > The reason for the test failure might be somewhere else hiding behind
>      > several layers of statsmodels, but only started to show up with
>     numpy 1.10.1
>      >
>      > I already have the reduce exception with my currently installed numpy
>      > '1.9.2rc1'
>      >
>      >>>> x = np.random.random(9*3).view([('const', 'f8'),('x_1', 'f8'),
>      > ('x_2', 'f8')]).view(np.recarray)
>      >
>      >>>> np.ptp(x, axis=0)
>      > Traceback (most recent call last):
>      >   File "<stdin>", line 1, in <module>
>      >   File
>      >
>     "C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py",
>      > line 2047, in ptp
>      >     return ptp(axis, out)
>      > TypeError: cannot perform reduce with flexible type
>      >
>      >
>      > Sounds like fun, and I don't even know how to automatically bisect.
>      >
>      > Josef
>
>     That example isn't the problem (ptp should definitely fail on structured
>     arrays), but I've tracked down what is - it has to do with views of
>     record arrays.
>
>     The fix looks simple, I'll get it in for the next release.
>
>
> Thanks,
>
> I realized that at that point in the statsmodels code we should have
> only regular ndarrays, so the array conversion fails somewhere.
>
> AFAICS, the main helper function to convert is
>
> def struct_to_ndarray(arr):
>      return arr.view((float, len(arr.dtype.names)))
>
> which doesn't look like it will handle other dtypes than float64. Nobody
> ever complained, so maybe our test suite is the only user of this.
>
> What is now the recommended way of converting structured
> dtypes/recarrays to ndarrays?
>
> Josef

Yes, that's the code I narrowed it down to as well. I think the code in 
statsmodels is fine, the problem is actually a  bug I must admit I 
introduced in changes to the way views of recarrays work.

If you are curious, the bug is in this line:

https://github.com/numpy/numpy/blob/master/numpy/core/records.py#L467

This line was intended to fix the problem that accessing a nested record 
array field would lose the 'np.record' dtype. I only considered void 
structured arrays, and had forgotten about sub-arrays which statsmodels 
uses.

I think the fix is to replace `issubclass(val.type, nt.void)` with 
`val.names` or something similar. I'll take a closer look soon.

Allan




More information about the NumPy-Discussion mailing list