[Python-Dev] Clean way in python to test for None, empty, scalar, and list/ndarray? A prayer to the gods of Python

Fri Jun 14 22:55:33 CEST 2013

On Fri, 14 Jun 2013 21:12:00 +0200, Martin Schultz <maschu09 at gmail.com> wrote:
> 2. Testing for empty lists or empty ndarrays:
> 
>  In principle, `len(x) == 0` will do the trick. **BUT** there are several
> caveats here:
>    - `len(scalar)` raises a TypeError, so you will have to use try and
> except or find some other way of testing for a scalar value
>    - `len(numpy.array(0))` (i.e. a scalar coded as numpy array) also raises
> a TypeError ("unsized object")
>    - `len([[]])` returns a length of 1, which is somehow understandable,
> but - I would argue - perhaps not what one might expect initially
> 
>  Alternatively, numpy arrays have a size attribute, and
> `numpy.array([]).size`, `numpy.array(8.).size`, and
> `numpy.array([8.]).size` all return what you would expect. And even
> `numpy.array([[]]).size` gives you 0. Now, if I could convert everything to
> a numpy array, this might work. But have you ever tried to assign a list of
> mixed data types to a numpy array? `numpy.array(["a",1,[2,3],(888,9)])`
> will fail, even though the list inside is perfectly fine as a list.

In general you test whether nor not something is empty in Python by
testing its truth value.  Empty things are False.  Numpy seems to
follow this using size, from the limited examples you have given

   >>> bool(numpy.array([[]])
   False
   >>> bool(numpy.array([[1]])
   True

I have no idea what the definition of numpy.array.size is that it would
return 0 for [[]], so its return value obviously defies my intuition
as much as len([[]]) seems to have initially defied yours :)

> 3. Testing for scalar:
> 
>  Let's suppose we knew the number of non-empty elements, and this is 1.
> Then there are occasions when you want to know if you have in fact `6` or
> `[6]` as an answer (or maybe even `[[6]]`). Obviously, this question is
> also relevant for numpy arrays. For the latter, a combination of size and
> ndim can help. For other objects, I would be tempted to use something like
> `isiterable()`, however, this function doesn't exist, and there are
> numerous discussions how one could or should find out if an object is
> iterable - none of them truly intuitive. (and is it true that **every**
> iterable object is a descendant of collections.Iterable?)

No, but...I'm not 100% sure about this as I tend to stay away from ABCs
myself, but my understanding is that collections.Iterable checks if an
object is iterable when you use it in an isinstance check.  There are
probably ways to fool it, but I think you could argue that any such data
types are broken.

> 4. Finding the number of elements in an object:
> 
>  From the discussion above, it is already clear that `len(x)` is not very
> robust for doing this. Just to mention another complication: `len("abcd")`
> returns 4, even though this is only one string. Of course this is correct,
> but it's a nuisance if you need to to find the number of elements of a list
> of strings and if it can happen that you have a scalar string instead of a
> 1-element list. And, believe me, such situations do occur!

len is robust when you consider that it only applies to sequences (see
below).  (I don't know what it means to "code a scaler as a numpy
array", but if it is still a scaler, it makes sense that it raises
a TypeError on len...it should.)

> 5. Forcing a scalar to become a 1-element list:
> 
>  Unfortunately, `list(77)` throws an error, because 77 is not iterable.
> `numpy.array(77)` works, but - as we saw above - there will be no len
> defined for it. Simply writing `[x]` is dangerous, because if x is a list
> already, it will create `[[77]]`, which you generally don't want. Also,
> `numpy.array([x])` would create a 2D array if x is already a 1D array or a
> list. Often, it would be quite useful to know for sure that a function
> result is provided as a list, regardless of how many elements it contains
> (because then you can write `res[0]` without risking the danger to throw an
> exception). Does anyone have a good suggestion for this one?

Well, no.  If the list is empty res[0] will throw an error.  You need
to know both that it is indexable (note: not iterable...an object can be
iterable without being indexable) and that it is non-empty.  Well behaved
objects should I think pass an isinstance check against collections.Sequence.
(I can't think of a good way to check for indexability without the abc.)

> Enough complaining. Here comes my prayer to the python gods: **Please**
> 
>  - add a good `isiterable` function

That would be spelled isinstance(x, collections.Iterable), it seems.

>  - add a `size` attribute to all objects (I wouldn't mind if this is None
> in case you don't really know how to define the size of something, but it
> would be good to have it, so that `anything.size` would never throw an error

Why?  What is the definition of 'size' that makes it useful outside
of numpy?

>  - add an `isscalar` function which would at least try to test if something
> is a scalar (meaning a single entity). Note that this might give different
> results compared to `isiterable`, because one would consider a scalar
> string as a scalar even though it is iterable. And if `isscalar` would
> throw exceptions in cases where it doesn't know what to do: fine - this can
> be easily captured.

This I sort of agree with.  I've often enough wanted to know if something
is a non-string iterable.  But you'd have to decide if bytes/bytearray
is a sequence of integers or a scaler...

>  - enable the `len()` function for scalar variables such as integers or
> floats. I would tend to think that 1 is a natural answer to what the length
> of a number is.

That would screw up the ABC type hierarchy...the existence of len
indicates that an iterable is indexable.

--David