[Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

Wed Jul 6 17:17:50 EDT 2016

On Wed, Jul 6, 2016 at 4:56 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
> On Wed, Jul 6, 2016 at 6:26 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Jul 5, 2016 11:21 PM, "Ralf Gommers" <ralf.gommers at gmail.com> wrote:
>> >
>> >
>> >
>> > On Wed, Jul 6, 2016 at 7:06 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> >
>> >> On Jul 5, 2016 9:09 PM, "Joseph Fox-Rabinovitz"
>> >> <jfoxrabinovitz at gmail.com> wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > I have generalized np.atleast_1d, np.atleast_2d, np.atleast_3d with a
>> >> > function np.atleast_nd in PR#7804
>> >> > (https://github.com/numpy/numpy/pull/7804).
>> >> >
>> >> > As a result of this PR, I have a couple of questions about
>> >> > `np.atleast_3d`. `np.atleast_3d` appears to do something weird with
>> >> > the dimensions: If the input is 1D, it prepends and appends a size-1
>> >> > dimension. If the input is 2D, it appends a size-1 dimension. This is
>> >> > inconsistent with `np.atleast_2d`, which always prepends (as does
>> >> > `np.atleast_nd`).
>> >> >
>> >> >   - Is there any reason for this behavior?
>> >> >   - Can it be cleaned up (e.g., by reimplementing `np.atleast_3d` in
>> >> > terms of `np.atleast_nd`, which is actually much simpler)? This would
>> >> > be a slight API change since the output would not be exactly the
>> >> > same.
>> >>
>> >> Changing atleast_3d seems likely to break a bunch of stuff...
>> >>
>> >> Beyond that, I find it hard to have an opinion about the best design
>> >> for these functions, because I don't think I've ever encountered a situation
>> >> where they were actually what I wanted. I'm not a big fan of coercing
>> >> dimensions in the first place, for the usual "refuse to guess" reasons. And
>> >> then generally if I do want to coerce an array to another dimension, then I
>> >> have some opinion about where the new dimensions should go, and/or I have
>> >> some opinion about the minimum acceptable starting dimension, and/or I have
>> >> a maximum dimension in mind. (E.g. "coerce 1d inputs into a column matrix;
>> >> 0d or 3d inputs are an error" -- atleast_2d is zero-for-three on that
>> >> requirements list.)
>> >>
>> >> I don't know how typical I am in this. But it does make me wonder if
>> >> the atleast_* functions act as an attractive nuisance, where new users take
>> >> their presence as an implicit recommendation that they are actually a useful
>> >> thing to reach for, even though they... aren't that. And maybe we should be
>> >> recommending folk move away from them rather than trying to extend them
>> >> further?
>> >>
>> >> Or maybe they're totally useful and I'm just missing it. What's your
>> >> use case that motivates atleast_nd?
>> >
>> > I think you're just missing it:) atleast_1d/2d are used quite a bit in
>> > Scipy and Statsmodels (those are the only ones I checked), and in the large
>> > majority of cases it's the best thing to use there. There's a bunch of
>> > atleast_2d calls with a transpose appended because the input needs to be
>> > treated as columns instead of rows, but that's still efficient and readable
>> > enough.
>>
>> I know people *use* it :-). What I'm confused about is in what situations
>> you would invent it if it didn't exist. Can you point me to an example or
>> two where it's "the best thing"? I actually had statsmodels in mind with my
>> example of wanting the semantics "coerce 1d inputs into a column matrix; 0d
>> or 3d inputs are an error". I'm surprised if there are places where you
>> really want 0d arrays converted into 1x1,
>
> Scalar to shape (1,1) is less common, but 1-D to 2-D or scalar to shape (1,)
> is very common. Example is at the top of scipy/stats/stats.py: the
> _chk_asarray functions (used in many other functions) take care to never
> return scalar arrays because those are plain annoying to deal with. If that
> sounds weird to you, you're probably one of those people who was never
> surprised by this:
>
> In [3]: x0 = np.array(1)
>
> In [4]: x1 = np.array([1])
>
> In [5]: x0[0]
> ---------------------------------------------------------------------------
> IndexError                                Traceback (most recent call last)
> <ipython-input-5-6a57e371ca72> in <module>()
> ----> 1 x0[0]
>
> IndexError: too many indices for array
>
> In [6]: x1[0]
> Out[6]: 1
>
>> or want to allow high dimensional arrays to pass through - and if you do
>> want to allow high dimensional arrays to pass through, then transposing
>> might help with 2d cases but will silently mangle high-d cases, right?
>
>>2d input handling is usually irrelevant. The vast majority of cases is
>> "function that accepts scalar and 1-D array" or "function that accepts 1-D
>> and 2-D arrays".

Often such a function would want to convert inputs internally.

>
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>