Enhanced Generators - reiterating over a file
Oren Tirosh
oren-py-l at hishome.net
Sun Feb 3 08:02:17 EST 2002
On Sat, Feb 02, 2002 at 10:05:11PM -0500, Kragen Sitaker wrote:
> > > That won't work when passed iterators as arguments, right?
> >
> > Sure, but as long as iterators are kept as invisible temporary objects
>
> Um, sure. iter(open("hello")) doesn't work that way already.
This is a real life example of how useful reiterable x functions can be
and how to solve the problem of reiterating over a file.
The following function takes a vector of samples and yields a stream of samples
with their values normalized to the range +-1:
def normalized(vector):
max_value = 0.0
for sample in vector:
max_value = max(max_value, abs(sample))
for sample in vector:
yield sample/max_value
Note that this must be a two-pass operation: you cannot yield the first
normalized sample before you find the maximum value.
My data is in a text file with one decimal sample per line. This code reads
it and writes the normalized samples to another file in the same format.
vector = map(float, file('samples.dat'))
outfile = file('normalized.dat','w')
for sample in normalized(vector):
print >>outfile, sample
outfile.close()
This works in Python 2.2. The only problem is that it reads the entire file
into memory. What if the file is too big to read into memory or I just don't
want to stress virtual memory unnecessarily?
Just add the magic x! change map->xmap, file->xfile and everything works
exactly the same but without any temporary lists.
xfile is a lazy file object: the object just stores the filename. Only when
iter(xfile('filename')) is called the returned iterator object opens a
temporary file descriptor to walk through the file.
xmap must be the truly lazy version that returns an iterable object, not the
half-eager half-lazy version that returns an iterator. This is because the
function normalized() has to scan the source twice.
An xfile object really simulates a container - you can use iter() to get
multiple independent iterators of the same container. It should also appeal
to the fans of a certain TV show :-)
A Python file object is not really a container: it can be argued that a file
object already *is* a kind of iterator. It is a temporary object used to walk
through a container. The real container in this case is the actual file on
the disk. An xfile object represents a file on the disk.
x-ly yours,
Oren
More information about the Python-list
mailing list