File iteration in 2.2

Tue Aug 27 22:29:55 EDT 2002

Here's a question which is mostly curiosity, because I figure there
*must* be a good reason for this.

I noticed (in version 2.2) that iterating over a file seems to use a
separate buffer than the file's other operations. For example, this code
    while 1:
        line = f.readline()
        break
will leave the file positioned after the first line in the file
(assuming f was at zero before), while this
    for line in f:
        break
will leave the file pointer much further on, approximately eight kbytes
further. This 8K happens to be CHUNKSIZE defined in xreadlinesmodule.c,
and the __iter__ method of a file returns an xreadlines object,
essentially meaning (to use sloppy terminology, I suppose) that files
are not actually iterators at all.

It seemed to me quite odd that file's "next" method wasn't essentially
just a call to readline. So I made a version of fileobject.c with the
following changes:
  Changed the __iter__ method from
    static PyObject *
    file_getiter(PyObject *f)
    {
        return PyObject_CallMethod(f, "xreadlines", "");
    }
  to
    static PyObject *
    file_getiter(PyObject *f)
    {
        Py_INCREF(f);
        return f;
    }
  and set the iternext slot to point to this function:
    static PyObject *
    file_readline_iter(PyFileObject *f)
    {
        char *l;
        if (f->f_fp == NULL)
            return err_closed();
        l = get_line(f, 0);
        if (!l)
        {   PyErr_SetString(PyExc_StopIteration, Py_None);
            return NULL;
        }
        return l;
    }

This all seemed very easy, and it all works except for one teensy
problem, namely that if you use it in a for loop, the loop never ends. I
assume that that's simply because I don't know what I'm doing and raised
StopIteration incorrectly.

But I'm wondering if there's more to it than that, because I don't see
why it wasn't implemented this way.....

-- 
Clarence Gardner
Software Engineer
NetLojix Communications
cgardner at netlojix.com