[Numpy-discussion] New DTypes: Are scalars a central concept in NumPy or not?

josef.pktd at gmail.com josef.pktd at gmail.com
Sat Feb 22 09:53:29 EST 2020


On Sat, Feb 22, 2020 at 9:41 AM <josef.pktd at gmail.com> wrote:

>
>
> On Sat, Feb 22, 2020 at 9:34 AM <josef.pktd at gmail.com> wrote:
>
>> not having a hashable tuple conversion would be a strong limitation
>>
>> a = tuple(np.arange(5))
>> versus
>> a = tuple([np.array(i) for i in range(5)])
>> {a:5}
>>
>
> also there is the question of which scalar
>
> .item() versus [()]
>
> This was used in the old times in scipy.stats, and I just saw
> https://github.com/scipy/scipy/pull/11165#issuecomment-589952838
>
> aside:
> AFAIR, I use 0-dim arrays also to ensure that I have a numpy dtype and
> not, e.g. some equivalent python type
>

0-dim as mutable pseudo-scalar


a = np.asarray(5)
a, id(a)
(array(5), 844574884528)

a[()] = 1
a, id(a)
(array(1), 844574884528)

maybe I never used that,
In a recent similar case, I could use just a 1-d list or array to work
around python's muting or mutability behavior



> Josef
>
>
>>
>> Josef
>>
>> On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski <
>> evgeny.burovskiy at gmail.com> wrote:
>>
>>> Hi Sebastian,
>>>
>>> Just to clarify the difference:
>>>
>>> >>> x = np.float64(42)
>>> >>> y = np.array(42, dtype=float)
>>>
>>> Here `x` is a scalar and `y` is a 0D array, correct?
>>> If that's the case, not having the former would be very confusing for
>>> users (at least, that would be very confusing to me, FWIW).
>>>
>>> If anything, I think it'd be cleaner to not have the latter, and only
>>> have either scalars or 1D arrays (i.e., N-D arrays with N>=1), but it
>>> is probably way too late to even think about it anyway.
>>>
>>> Cheers,
>>>
>>> Evgeni
>>>
>>> On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
>>> <sebastian at sipsolutions.net> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > When we create new datatypes, we have the option to make new choices
>>> > for the new datatypes [0] (not the existing ones).
>>> >
>>> > The question is: Should every NumPy datatype have a scalar associated
>>> > and should operations like indexing return a scalar or a 0-D array?
>>> >
>>> > This is in my opinion a complex, almost philosophical, question, and we
>>> > do not have to settle anything for a long time. But, if we do not
>>> > decide a direction before we have many new datatypes the decision will
>>> > make itself...
>>> > So happy about any ideas, even if its just a gut feeling :).
>>> >
>>> > There are various points. I would like to mostly ignore the technical
>>> > ones, but I am listing them anyway here:
>>> >
>>> >   * Scalars are faster (although that can be optimized likely)
>>> >
>>> >   * Scalars have a lower memory footprint
>>> >
>>> >   * The current implementation incurs a technical debt in NumPy.
>>> >     (I do not think that is a general issue, though. We could
>>> >     automatically create scalars for each new datatype probably.)
>>> >
>>> > Advantages of having no scalars:
>>> >
>>> >   * No need to keep track of scalars to preserve them in ufuncs, or
>>> >     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>>> >     (or decide they return always arrays, although ufuncs may not)
>>> >
>>> >   * Seems simpler in many ways, you always know the output will be an
>>> >     array if it has to do with NumPy.
>>> >
>>> > Advantages of having scalars:
>>> >
>>> >   * Scalars are immutable and we are used to them from Python.
>>> >     A 0-D array cannot be used as a dictionary key consistently [1].
>>> >
>>> >     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>>> >     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>>> >     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>>> >
>>> >   * Object arrays as we have them now make sense, `arr1d[0]` can
>>> >     reasonably return a Python object. I.e. arrays feel more like
>>> >     container if you can take elements out easily.
>>> >
>>> > Could go both ways:
>>> >
>>> >   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>>> >     without scalars. With scalars `arr1d[0, ...]` clarifies the
>>> >     meaning. (In principle it is good to never use `arr2d[0]` to
>>> >     get a 1D slice, probably more-so if scalars exist.)
>>> >
>>> > Note: array-scalars (the current NumPy scalars) are not useful in my
>>> > opinion [3]. A scalar should not be indexed or have a shape. I do not
>>> > believe in scalars pretending to be arrays.
>>> >
>>> > I personally tend towards liking scalars.  If Python was a language
>>> > where the array (array-programming) concept was ingrained into the
>>> > language itself, I would lean the other way. But users are used to
>>> > scalars, and they "put" scalars into arrays. Array objects are in some
>>> > ways strange in Python, and I feel not having scalars detaches them
>>> > further.
>>> >
>>> > Having scalars, however also means we should preserve them. I feel in
>>> > principle that is actually fairly straight forward. E.g. for ufuncs:
>>> >
>>> >    * np.add(scalar, scalar) -> scalar
>>> >    * np.add.reduce(arr, axis=None) -> scalar
>>> >    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>>> >    * np.add.reduce(scalar, axis=()) -> array
>>> >
>>> > Of course libraries that do `np.asarray` would/could basically chose to
>>> > not preserve scalars: Their signature is defined as taking strictly
>>> > array input.
>>> >
>>> > Cheers,
>>> >
>>> > Sebastian
>>> >
>>> >
>>> > [0] At best this can be a vision to decide which way they may evolve.
>>> >
>>> > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
>>> > strange. E.g. Quantity defines hash correctly, but does not fully
>>> > ensure immutability for 0-D Quantities. Ensuring immutability in a
>>> > world where "views" are a central concept requires a write-only copy.
>>> >
>>> > [2] Arguably `.item()` would always return a scalar, but it would be a
>>> > second class citizen. (Although if it returns a scalar, at least we
>>> > already have a scalar implementation.)
>>> >
>>> > [3] They are necessary due to technical debt for NumPy datatypes
>>> > though.
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion at python.org
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200222/ae9b671a/attachment.html>


More information about the NumPy-Discussion mailing list