conatraints on "for" magic?

Alex Martelli alex at magenta.com
Wed Jul 26 03:02:14 EDT 2000


"(Greg Weeks)" <weeks at golden.dtc.hp.com> wrote in message
news:8llqso$q75$1 at news.dtc.hp.com...
    [snip]
> How did the implementer of the "fileinput" module convince the "for"
> construct that fileinput.input() was a sequence?

"Use the source" is often good advice.  Opening the source file:
    D:\Python\Py152\Lib\fileinput.py
(or wherever the fileinput.py module is kept on your system, of
course), we find ourselves facing a modest 255 lines of code,
of which the first 75 are an extensive docstring.

Lines 124 to 238 are concerned with defining the FileInput
class (the function fileinput.input, defined on lines 80-85, is
clearly just instantiating a FileInput object and also storing it
in a module-private global variable called _state).

All the 'magic' needed for a FileInput object to behave like a
sequence in a for statement is actually contained in lines
154 to 160...:

    def __getitem__(self, i):
        if i != self._lineno:
            raise RuntimeError, "accessing lines out of order"
        line = self.readline()
        if not line:
            raise IndexError, "end of input reached"
        return line

Much of this is error-checking: __getitem__ is supposed to
be called (by the for statement) with an index i that starts at
0 and increases by 1 each time.  Once this is confirmed, the
actual operation is delegated to FileInput's own readline
method.  If there is no more line, an IndexError must be
raised: the for statement traps this and uses it as the
indication that the sequence is finished.

Method readline, lines 186 to 222, is substantially richer,
as it must handle all sort of boundary cases -- moving to
the next file if the current one is exhausted, handling in-place
updating, etc, etc.


> (In "Python ESSENTIAL REFERENCE" I don't find any way of creating class
> instances that are recognized by "for" as sequences.)

Page 29: "it's possible to use class definitions to define new objects
that behave like the built-in types. To do this, supply implementations
of the special methods described in this section."  On page 30-31,
among the "Sequence and Mapping Methods", __getitem__ is also
described; I believe you are correct in noticing that nowhere is it
clarified that this is the specific method used by the for statement,
and that IndexError is the way __getitem__ tells the for statement
that the sequence is finished (this is in fact handier than, say, having
for use __len__, because the sequence-object need not determine
beforehand how many items it contains; but, it's not immediately
and intuitively obvious!-).

The issue may be clearer if we only reproduce the bare-bone function
of the rich FileInput module: wrap one file-like object (which supplies
a readline method) into something that can be iterated on a la:

    for line in LineSequence(fileobject):
        process(line)

To get this, and this only, without special error checking, we'll do:

class LineSequence:
    def __init__(self, fileobject):
        self._file = fileobject
    def __getitem__(self, i):
        line = self.fileobject.readline()
        if not line:
            raise IndexError
        return line

Now these bones are bare indeed, but maybe having the essential
part of the "sequence-simulation" exposed like this can make it
easier to understand.


Alex






More information about the Python-list mailing list