[Python-3000] Iterators for dict keys, values, and items == annoying :)

Fri Mar 24 01:01:08 CET 2006

Guido van Rossum wrote:
> On 3/23/06, Ian Bicking <ianb at colorstudy.com> wrote:
> [Guido]
> 
>>>Testing whether an iterator is empty or not is an oxymoron; the only
>>>legit way is to call next() and see whether it raises StopIteration.
>>>This is the fundamental confusion I am talking about. It is NOT
>>>"natural enough". It reveals a fundamental misunderstanding of the
>>>design of the iterator protocol.
>>
>>I'm talking about a use case, not the protocol.  Where iterators are
>>used, it is very common that you also want to distinguish between zero
>>and some items.
> 
> 
> Really? Methinks you are thinking of a fairly specific context -- when
> presenting database query results to a user. The problem IMO lies in
> SQLObject (which I admit I've never used) or perhaps in SQL itself, or
> the specific underlying DB. In most other situations, you have an
> honest-to-god container (e.g. a dict) which you can test for emptiness
> before even asking for an iterator over its items. When all you have
> is a query represented as an iterator this doesn't fly. That's why
> some DB API implementations return the number of results as the
> non-standard return value of the query API (at least that's what I
> recall -- it's been a while since I used the DB API).

In SQLObject it came about due to a desire to lazily load objects out of 
a query.  The lazy behavior had other problems (mostly introducing 
concurrency where you wouldn't expect).  In addition, the query is only 
run when you start iterating.  I'm not sure if that is good or bad 
design -- that queries are iterable doesn't seem that bad, except that 
the query is only invoked with iter() and that doesn't give very good 
access to the actual executed-query object; it's all too implicit.  I 
don't know if the same issues exist for .items/.keys; I guess it would 
only be an issue if you passed one of iterators to some routine that 
didn't have access to the original dict.

The identical problem does exist for all generators.  Using ad hoc flags 
in for loops isn't a great solution.  It's all somewhat similar to the 
repr() problem as well.

Coming back around to the idea of implementing __getitem__ and such, I 
suppose a list-like iterator wrapper could be useful.  That would 
consume and retain the results of the iterator lazily to satisfy the 
things done to the object.  That would be kind of interesting; I 
implemented several such methods on the select result object in 
SQLObject for that purpose, and that aspect actually works pretty well. 
  There's some predictability problems, though.  bool(obj) would only 
have to consume one item, but len(obj) would consume the entire thing, 
and usually len() is a pretty innocuous function to use.

If this was done, it would be nice if an iterator could give hints, like 
a faster implementation of __len__ than the fallback behavior that only 
can use .next().

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org