__nonzero__ of iterators

Terry Reedy tjreedy at udel.edu
Fri Apr 2 11:56:06 EST 2004


"Christian Eder" <eder at tttech.com> wrote in message
news:c4gjup$615$1 at ttt14.vie.at.tttech.ttt...
> I just discovered the following pitfall in Python 2.3.

It is only a pitfall for those who mistakenly think that iterator ==
(actual, in memory, Python) sequence.  They are related but quite different
in their Python meaning.  A sequence (array, actually) allows random
access.  An iterator only allows forward scanning.  A sequence, in general,
is accessed by indexing or slicing.  An iterator only returns values via a
.next method.  I don't know if the turorial emphasizes these differences
enough or not.

> Consider the following code :
>
>  >>> a = {}
>  >>> bool (a.keys ())
> False
>  >>> bool (a.iterkeys ())
> True
>
> So, an "empty" iterator evaluates to True in boolean context.

So what?  Why would you take bool() of an object known to be non-null?

> At a first glance, this is not what one would expect.

Depends on whether one mistakenly thinks iterator == sequence.

Consider: bool(0) == bool(0L) == bool(0.0) == False but bool('0') == True.
A pitfall?  Yes, for someone who thinks number representations can blindly
replace numbers (as, I have the impression, is true for some other
languages).  No, for someone who understands the difference in Python

> This causes
> several problems, e.g. if you operate on something expected to be
> sequence, and you guard your code with "if seq :" to avoid crashing into
> an empty sequence, you still crash if you get an empty iterator,

Passing an iterator to a sequence function is a bug that should cause a
crash.  So should passing a numberstring to a number function that does not
check and convert if necessary.  Consider:

if num: return hi//num

If num == '1', this will crash, as it should.  So will '''print "This
answer is " + numstring''' when numstring is an unconverted number.

> even if
> the rest of your code is able to deal with an iterator as well as with
> any other sequence type.

To repeat, an iterator is not a sequence type.  Not is a sequence and
iterator type.  But both are iterable types.

If 'the rest of your code' deals with the sequence *as a sequence*, by
using sequence access methods, then it cannot deal with the iterator 'as
well' unless you convert the iterator to a sequence with a sequence
constructor, as in 'seq = list(it)'.   (Yes, you may want to type-guard
this).  And you put that *before* your 'if seq:' guard.  But the current
idiom (see * note below) for new code is to only do this if you need random
access (to sort, for instance).

If you only need forward sequential access, the current idiom is to convert
the sequence to an iterator with 'it = iter(seq)'.  Or use a for loop which
does this for you.  Or call a function itself coded with an iter call or
for loop.  To avoid the need to test the type first, before the conversion,
iterators were defined as returning themselves in response to iter() so
that it = iter(it)' is innocuous in a way that mylist = list(mylist) may
not be.

For an explicit while loop (which is seldom needed), the idiom is

it = iter(input)
while 1:
  try:
    item = it.next
    <do something with item>
  except StopIteration:
    break

For iterators, the item-requiring code is guarded by try: except
StopIteration: instead of  if seq:.


* The reason to convert 'iterables' to iterator rather than list as the
common type is that the former scales better as one expands to serially
processing millions, billions, and even trillions of items.  In addition,
for statements convert to iterator anyway.  (The current exception for
old-iterator-protocol objects will, I am sure, disappear sometime in the
future.)

Also, in the (possibly mythical) future, Guido would like to change generic
list-returning functions and methods to return iterators instead.
('Generic' means no specific reason to do otherwise, as there obviously is
with tuple, list, list.sequence, list.sort, etc.)  So, for instance,
dict.iterkeys would be renamed dict.keys and the current dict.keys (which
effectively equals list(dict.iterkeys()) anyway) would disappear -- and
this post could no longer happen ;-)

Terry J. Reedy







More information about the Python-list mailing list