A file object is an iterator, not a container
Oren Tirosh
oren-py-l at hishome.net
Wed Nov 14 01:25:23 EST 2001
I am currently working on a framework for dataflow-oriented programming.
This framework makes it easy to create dataflows by connecting together
sources, transformations and destinations. It makes extensive use of the
new iterators and generator functions defined in Python 2.2. A first public
release is expected soon.
During the design I have encountered an inconsistency in the way Python
uses iterators for files: Python pretends that an open file object is a
container that can create iterators while in fact an open file object
already *is* an iterator with a "nonstandard" protocol (readline instead of
next). The real container is the file residing on the disk, not the open
file object.
To see the difference, try creating two iterators from one list and two
iterators from one file. You can read the two list iterators independently
but you can't read the two file iterators independently.
My proposal is to make an open file object obey the iterator protocol:
__iter__():
return self
next()
return next line
raise StopIteration on EOF
<CONTROVERSIAL> close file on EOF </CONTROVERSIAL>
Is there any chance this could still make it to the final 2.2 release or
would a thorough discussion of this proposal on the list take too long?
A natural future extension of this idea would be to define an unopened file
object. This object will truly act as a container because it is a factory
for multiple, independent iterators of itself - open files with different
file descriptors.
>>> f=file('/etc/passwd' [,mode, buffering, etc] )
>>> f
<file '/etc/passwd', mode 'r' at 0x12345678>
>>> i=iter(f) # i=f.open() is equivalent
>>> j=iter(f)
>>> i
<open file '/etc/passwd', mode 'r' at 0x23456789>
>>> i.fileno()
4
>>> j.fileno()
5
The open() method is an alias for an unopened file's __iter__() method.
Note that a closed file is not the same as an unopened file: a closed file
is an exhausted iterator, not a factory for new open files.
There should also be a method to iterate over files with some fixed size
chunk instead of by lines. The buffering argument to open() may be reused
for this purpose or a new optional argument may be defined.
Oren Tirosh
More information about the Python-list
mailing list