Iterators and adaptation: (WAS Re: eof?)

Michael Robin me at mikerobin.com
Wed Aug 15 23:24:09 EDT 2001


"Steven D. Majewski" <sdm7g at Virginia.EDU> wrote in message > > "Dietmar Lang" <dietmar at wohnheim.fh-wedel.de> wrote in message > There are several ways.  Best overall, today (Python 2.1):
> > > >
> > > >     for line in fh.xreadlines():
> > > >         process(line)
> And in 2.2, file objects can automatically return an iterator 
> in the context of a for loop, so you can just say:
> 

   [0]
> 	for line in open( filename ):
> 		process(line)
> 

(x)readlines is pretty explict, but in general it would be nice, as in
the case above (or fineinput, etc.), to be able to know what kind of
object is being returned by the interator/sequence. For example:

 [1]	for line in open( filename ).lines:
 		process(line)

this way you could write:

 [1]	for char in open( filename ).chars:
             ...

or whatever you wanted. When an object is it's own iterator,
there could be name conflicts, so you could even
(in a kind of PEP246'ish style) use:

 [2]	for csvTuple in adapt( open( filename ), FileAdaptors.CSV):
               process( csvTuple)

"open(file)" could still produce an iterator that does lines by
default, but you can be explicit if you want, and the iterator could
respond for requests to adapt. (Or it could return a "stub" iterator
that needs to be specialized before use. If next() was called there'd
be an RT error saying "What do you want to iterate over?: [chars,
lines, ...]") (? Unless specified otherwise, adaptation is only
garanteed to work before the first next() call.)

One way to do this is to make __iter__ parametric - but then we'd need
a way to get the value(s) there. Perhaps by asking for the iterator
explicitly:

 [3]	for csvTuple in open( filename ).__iter__(CSV):
 		process( csvTuple)

The other option is similar to "x in dict" doing keys like "x in
dict.iterkeys()", and also having "itervalues()" and "iteritems()" -
but I think its nice to be truely parametric vs. using an indicative
method name (but even that's better than nothing), such as using:

 [4]	for line in open( filename ).iterlines():
               ...

(If you're dealing with iterators implicitly produced by generator
functions, there may be no way to do this on the base object protocol,
unless functions that yield'ed supported an optional __whatToGenerate
keyword parameter or some other hack.)

It seems like even if not enforced by the language, there should be a
prefered style for doing this sort of thing, at least for modules in
the std distribution. Personally, I think the default and
non-requested iterator protocol on file-like objects (for example) are
probably a good thing and make for short code - but the option of
being explicit though method [4] or [3] (if deemed worth exploring)
should be provided. This is more for the purpose of self-documenting
code than any grand unification principal.

thanks,
mike



More information about the Python-list mailing list