[Tutor] Re: file filter [Python 2.2 iterators]

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Thu, 27 Dec 2001 01:53:37 -0800 (PST)


On Wed, 26 Dec 2001, kevin parks wrote:

> Happy Holidays.

Hey Kevin, long time no see.  Glad to hear from you again.


> I am trying to write a some python code that will copy a file to a new
> file but with certain lines filtered out. Let's say that i want the
> new file to have all lines except those that start with a semicolon or
> a letter c. So that an input file that has these lines:
> 
> ;i1 0 1 2 3 2
> i1 6 8 7 9 
> ci2 99 0 0 0 2
> i1 2 3 4
> i2 3 4 4 
> ci1 3 4 4 5
> ;i3 929 92 2
> i4 2 8 9 1
> 
> would yield:
> 
> i1 2 3 4
> i2 3 4 4 
> i4 2 8 9 1


Hmmm!  This sounds interesting!  I thought I might brush up on the new
Iterator stuff that's part of Python 2.2.  Here's something that may help
you:


###
class FilterIterator:
    """This wraps a filtering wrapper right on top of an iterator."""

    def __init__(self, filter_func, input_iter):
        """Initializes a filterer of a given input iterator.
        filter_func should be a boolean function that's true on the
        elements that we want to maintain."""
        self.filter_func, self.input_iter = filter_func, input_iter


    def __iter__(self):
        return self


    def next(self):
        while 1:
            next_value = self.input_iter.next()
            if self.filter_func(next_value):
                return next_value
###



In your code before:

> 	for aLine in infile.xreadlines() :

xreadlines() is a function that returns an "iterator" --- some object that
returns lines on-demand whenever we call next().  FilterIterator is meant
to sit right on top of an existing iterator, and act as the gatekeeper.  
If all goes well, it should only let lines that look ok to pass through.

Iterators are explained in gory detail here:

    http://python.sourceforge.net/peps/pep-0234.html

but as a result of its newness, it's very Python 2.2 specific.  (We can
recode the idea to work with older Python versions in another message if
you'd like.)




Let's test to see if it works:

###
>>> lines = ['hello', 'world', 'this', 'is', 'a', 'test']
>>> def isEvenLength(x):
...     return len(x) % 2 == 0
...
>>> myiter = FilterIterator(isEvenLength, iter(lines))
>>> myiter.next()
'this'
>>> myiter.next()
'is'
>>> myiter.next()
'test'
>>> myiter.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/tmp/python-25824Hte", line 17, in next
StopIteration
###


Yes.  *grin*  Wow, that actually worked!



We can train FilterIterator to do something similar to filter out anything
that starts with a 'c' or ';', like this:

###
def isGoodLine(l):
    """A line is "good" if it doesn't begin with 'c' or ';'."""
    return l[0] not in ('c', ';')


def boksa(infilename, outfilename):
    """this will copy a file exactly, args are: 'infile' and 'outfile'"""
    infile = open(infilename, 'r')
    f = open(outfilename, 'w')
    for aLine in FilterIterator(isGoodLine, iter(infile)):
        f.write( aLine )
    infile.close()
    f.close()
###


This allows you to filter out lines in your files with minimal changes to
your code's logic.  I'm still a newbie myself with this iterator stuff, so
FilterIterator could probably be improved.  Still, I hope this helps!