Enhanced Generators - reiterating over a file

Oren Tirosh oren-py-l at hishome.net
Sun Feb 3 08:02:17 EST 2002


On Sat, Feb 02, 2002 at 10:05:11PM -0500, Kragen Sitaker wrote:
> > > That won't work when passed iterators as arguments, right?
> > 
> > Sure, but as long as iterators are kept as invisible temporary objects
> 
> Um, sure.  iter(open("hello")) doesn't work that way already.

This is a real life example of how useful reiterable x functions can be
and how to solve the problem of reiterating over a file.

The following function takes a vector of samples and yields a stream of samples 
with their values normalized to the range +-1:

def normalized(vector):
    max_value = 0.0
    for sample in vector:
        max_value = max(max_value, abs(sample))
    for sample in vector:
        yield sample/max_value

Note that this must be a two-pass operation: you cannot yield the first
normalized sample before you find the maximum value.

My data is in a text file with one decimal sample per line.  This code reads
it and writes the normalized samples to another file in the same format.

vector = map(float, file('samples.dat'))
outfile = file('normalized.dat','w')
for sample in normalized(vector):
    print >>outfile, sample
outfile.close()

This works in Python 2.2.  The only problem is that it reads the entire file
into memory.  What if the file is too big to read into memory or I just don't 
want to stress virtual memory unnecessarily?

Just add the magic x! change map->xmap, file->xfile and everything works 
exactly the same but without any temporary lists.

xfile is a lazy file object: the object just stores the filename.  Only when 
iter(xfile('filename')) is called the returned iterator object opens a 
temporary file descriptor to walk through the file.

xmap must be the truly lazy version that returns an iterable object, not the 
half-eager half-lazy version that returns an iterator.  This is because the 
function normalized() has to scan the source twice.

An xfile object really simulates a container - you can use iter() to get 
multiple independent iterators of the same container.  It should also appeal 
to the fans of a certain TV show :-)

A Python file object is not really a container: it can be argued that a file 
object already *is* a kind of iterator.  It is a temporary object used to walk 
through a container.  The real container in this case is the actual file on 
the disk.  An xfile object represents a file on the disk.

   x-ly yours,

	Oren





More information about the Python-list mailing list