File iteration in 2.2
Clarence Gardner
clarence at netlojix.com
Tue Aug 27 22:29:55 EDT 2002
Here's a question which is mostly curiosity, because I figure there
*must* be a good reason for this.
I noticed (in version 2.2) that iterating over a file seems to use a
separate buffer than the file's other operations. For example, this code
while 1:
line = f.readline()
break
will leave the file positioned after the first line in the file
(assuming f was at zero before), while this
for line in f:
break
will leave the file pointer much further on, approximately eight kbytes
further. This 8K happens to be CHUNKSIZE defined in xreadlinesmodule.c,
and the __iter__ method of a file returns an xreadlines object,
essentially meaning (to use sloppy terminology, I suppose) that files
are not actually iterators at all.
It seemed to me quite odd that file's "next" method wasn't essentially
just a call to readline. So I made a version of fileobject.c with the
following changes:
Changed the __iter__ method from
static PyObject *
file_getiter(PyObject *f)
{
return PyObject_CallMethod(f, "xreadlines", "");
}
to
static PyObject *
file_getiter(PyObject *f)
{
Py_INCREF(f);
return f;
}
and set the iternext slot to point to this function:
static PyObject *
file_readline_iter(PyFileObject *f)
{
char *l;
if (f->f_fp == NULL)
return err_closed();
l = get_line(f, 0);
if (!l)
{ PyErr_SetString(PyExc_StopIteration, Py_None);
return NULL;
}
return l;
}
This all seemed very easy, and it all works except for one teensy
problem, namely that if you use it in a for loop, the loop never ends. I
assume that that's simply because I don't know what I'm doing and raised
StopIteration incorrectly.
But I'm wondering if there's more to it than that, because I don't see
why it wasn't implemented this way.....
--
Clarence Gardner
Software Engineer
NetLojix Communications
cgardner at netlojix.com
More information about the Python-list
mailing list