A file object is an iterator, not a container

Wed Nov 14 01:25:23 EST 2001

I am currently working on a framework for dataflow-oriented programming.
This framework makes it easy to create dataflows by connecting together
sources, transformations and destinations.  It makes extensive use of the 
new iterators and generator functions defined in Python 2.2.  A first public 
release is expected soon.  

During the design I have encountered an inconsistency in the way Python
uses iterators for files:  Python pretends that an open file object is a
container that can create iterators while in fact an open file object 
already *is* an iterator with a "nonstandard" protocol (readline instead of 
next).  The real container is the file residing on the disk, not the open 
file object.

To see the difference, try creating two iterators from one list and two
iterators from one file.  You can read the two list iterators independently
but you can't read the two file iterators independently.

My proposal is to make an open file object obey the iterator protocol:

__iter__():
  return self
next() 
  return next line 
  raise StopIteration on EOF
  <CONTROVERSIAL> close file on EOF </CONTROVERSIAL>

Is there any chance this could still make it to the final 2.2 release or
would a thorough discussion of this proposal on the list take too long?

A natural future extension of this idea would be to define an unopened file 
object.  This object will truly act as a container because it is a factory 
for multiple, independent iterators of itself - open files with different 
file descriptors.

>>> f=file('/etc/passwd' [,mode, buffering, etc] )
>>> f
<file '/etc/passwd', mode 'r' at 0x12345678>
>>> i=iter(f)		# i=f.open() is equivalent
>>> j=iter(f)
>>> i
<open file '/etc/passwd', mode 'r' at 0x23456789>
>>> i.fileno()
4
>>> j.fileno()
5

The open() method is an alias for an unopened file's __iter__() method.
Note that a closed file is not the same as an unopened file: a closed file
is an exhausted iterator, not a factory for new open files.  

There should also be a method to iterate over files with some fixed size 
chunk instead of by lines.  The buffering argument to open() may be reused 
for this purpose or a new optional argument may be defined.

	Oren Tirosh