Coding style

Tue Jul 18 18:53:48 EDT 2006

Volker Grabsch a écrit :
> Bruno Desthuilliers <onurb at xiludom.gro> schrieb:
> 
>>Carl Banks wrote:
>>
>>>Bruno Desthuilliers wrote:
>>>
>>>I'm well aware of Python's semantics, and it's irrelvant to my
>>>argument.
> 
> [...]
> 
>>>If the language
>>>were designed differently, then the rules would be different.
>>
>>Totally true - and totally irrelevant IMHO.
> 
> 
> I strongly advise not to treat each others thoughts as irrelevant.
> Assuming the opposite is a base of every public dicussion forum.

"Irrelevant" may not be the best expression of my thought here - it's 
just that Carl's assertion is kind of a tautology and doesn't add 
anything to the discussion. If Python had been designed as statically 
typed (with declarative typing), the rules would be different. Yeah, 
great. And now ?

> I assume here is a flaw in Python. To explain this, I'd like to
> make Bruno's point

Actually Carl's point, not mine.

> clearer. As usually, code tells more then
> thousand words (an vice versa :-)).
> 
> Suppose you have two functions which somehow depend on the emptyness
> of a sequence. This is a stupid example, but it demonstrates at
> least the two proposed programming styles:
> 
> ------------------------------------------------------
> 
>>>>def test1(x): 
> 
> ...     if x:
> ...             print "Non-Empty"
> ...     else:
> ...             print "Empty"
> ... 
> 
>>>>def test2(x):
> 
> ...     if len(x) > 0:
> ...             print "Non-Empty"
> ...     else:
> ...             print "Empty"
> ------------------------------------------------------
> 
> Bruno

Carl

> pointed out a subtle difference in the behaviour of those
> functions:
> 
> ------------------------------------------------------
> 
>>>>a = []     
>>>>test1(a)
> 
> Empty
> 
>>>>test1(iter(a))
> 
> Non-Empty
> 
>>>>test2(a)
> 
> Empty
> 
>>>>test2(iter(a))
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "<stdin>", line 2, in test2
> TypeError: len() of unsized object
> ------------------------------------------------------
> 
> 
> While test1() returns a wrong/random result when called with an
> iterator, the test2() function breaks when beeing called wrongly.

Have you tried these functions with a numpy array ?

> So if you accidently call test1() with an iterator, the program
> will do something unintended, and the source of that bug will be
> hard to find. So Bruno is IMHO right in calling that the source
> of a suptle bug.

Actually it's Carl who makes that point - MHO being that it's a 
programmer error to call a function with a param of the wrong type.

> However, if you call test2() with an iterator, the program will
> cleanly break early enough with an exception. That is generally
> wanted in Python. You can see this all over the language, e.g.
> with dictionaries:
> 
> ------------------------------------------------------
> 
>>>>d = { 'one': 1 }
>>>>print d['one']
> 
> 1
> 
>>>>print d['two']
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> KeyError: 'two'
> ------------------------------------------------------
> 
> Python could have been designed to return None when d['two'] has been
> called, as some other (bad) programming languages would. This would
> mean that the problem will occur later in the program, making it easy
> to produce a subtle bug. It would be some effort to figure out the
> real cause, i.e. that d had no entry for 'two'.

I don't think the comparison is right. The equivalent situation would be 
to have a function trying to access d['two'] on a dict-like type that 
would return a default value instead of raising a KeyError.

> Luckily, Python throws an exception (KeyError) just at the original
> place where the initial mistake occured. If you *want* to get None in
> case of a missing key, you'll have to say this explicitly:
> 
> ------------------------------------------------------
> 
>>>>print d.get('two', None)
> 
> None
> ------------------------------------------------------
> 
> So maybe "bool()" should also break with an exception if an object
> has neither a __nonzero__ nor a __len__ method, instead of defaulting
> to True. 

FWIW, Carl's main example is with numpy arrays, that have *both* methods 
- __nonzero__ raising an expression.

> Or a more strict variant of bool() called nonempty() should
> exist.
> 
> Iterators don't have a meaningful Boolean representation,
> because
> phrases like "is zero" or "is empty" don't make sense for them.

If so, almost no type actually has a "meaningfull" boolean value. I'd 
rather say that iterators being unsized, the mere concept of an "empty" 
iterator has no meaning.

> So
> instead of answering "false", an iterator should throw an exception
> when beeing asked whether he's empty.

> If a function expects an object to have a certain protocol (e.g.
> sequence), and the given object doesn't support that protocol,
> an exception should be raised.

So you advocate static typing ? Note that numpy arrays actually have 
both __len__ and __nonzero__ defined, the second being defined to 
forgive boolean coercion...

> This usually happens automatically
> when the function calls a non-existing method, and it plays very
> well with duck typing.
> 
> test2() behaves that way, but test1() doesn't. The reason is a
> sluttery of Python. Python should handle that problem as strict
> as it handles a missing key in a dictionary. Unfortunately, it
> doesn't.

Then proceed to write a PEP proposing that evaluating the truth value of 
an iterator would raise a TypeError. Just like numpy arrays do - as a 
decision of it's authors.

> I don't agree with Bruno

s/bruno/Carl/

> that it's more natural to write
>     if len(a) > 0:
>     ...
> instead of
>     if a:
>     ...
> 
> But I think that this is a necessary kludge you need to write
> clean code. Otherwise you risk to create subtle bugs.

s/you risk to create/careless programmers will have to face/

And FWIW, this is clearly not the opinion of numpy authors, who state 
that having len > 0 doesn't means the array is "not empty"...

> This advise,
> however, only applies when your function wants a sequence, because
> only in that can expect "len(a)" to work.

Since sequence types are defined as having a False value when empty, 
this test is redondant *and* "will create subtle bugs" when applied to a 
numpy array.

> I also agree with Carl that "if len(a) > 0" is less universal than
> "if a", because the latter also works with container-like objects
> that have a concept of emptiness, 

s/emptiness/boolean value/

> but not of length.

> However, this case is less likely to happen than shooting yourself
> in the foot by passing accidently an iterator to the function
> without getting an exception. I think, this flaw in Python is deep
> enough to justify the "len() > 0" kludge.

It surely justify some thinking on the boolean value of iterators. Since 
the common idiom for testing non-None objects is an explicit identity 
test against None - which makes sens since empty sequences and zero 
numerics eval to False in a boolean context - the less inappropriate 
solution would be to have iterators implementing __nonzero__ like numpy 
arrays do.

> 
> IMHO, that flaw of Python should be documented in a PEP as it violates
> Python's priciple of beeing explicit.

Here again, while I agree that there's room for improvement, I don't 
agree on this behaviour being a "flaw" - "minor wart" would better 
describe the situation IMHO.