File-like filter pattern?

Bengt Richter bokr at oz.net
Wed Dec 18 18:58:43 EST 2002


On Wed, 18 Dec 2002 02:41:23 -0800, Erik Max Francis <max at alcyone.com> wrote:

>"Martin v. Löwis" wrote:
>> 
>> Erik Max Francis <max at alcyone.com> writes:
>> 
>> > Is there a commonly accepted pattern for implementing file-like
>> > filters
>> > in Python?
>> 
>> Codecs work this way, see codecs.StreamReader and codecs.StreamWriter.
>
>In my case I dissented from the obvious choice of having the appropriate
>file-like object that the filter connects to (gzip.GzipFile is another
>obvious example) in the constructor, since I wanted people to be able to
>create a bunch of filters (of varying types, of course), and them stream
>them all together, rather than being mandated to created them in reverse
>order, so to speak.  This is how the current implementation of EmPy
>handles streams (you can chain them together manually or let EmPy do it
>for you by passing a list).  At present you attach a filter to a next
>one in the sequence by calling firstFilter.attach(secondFilter), but,
>again, you can just pass a list of filters to the appropriate function,
>and provided they're derived from the given Filter class, all the hard
>work is done for you (the filters are attached to the next in order
>until the last, which is attached to the proxy stdout).
>
>So maybe I should just abstract back one step, and have the base
>("abstract") Filter class implement those methods, and then have the
>basic implementations that are included with EmPy simply use sink as
>their implementation detail.  Does that sound more reasonable than
>asserting that people use a sink attribute?  In a way, it's a painless
>transition, since you just need to offer a next method (that returns
>what's next in line or None).  Call it a holdover from my non-Python
>days, but I tend to prefer interfaces defined in terms of methods,
>rather than attributes.
>
>So in the new formulation, a filter is an object that implements a
>file-like interface (minimally:  write, flush, and close methods), and
>implements methods for hooking them up to another file-like object
>(filter or not) via attach, and possibly detach, with the obvious choice
>of a next method which would indicate what is actually next in line (or
>None).  Does this sound like a better design than the previous one,
>which is file-like interface, and then a sink attribute (not method),
>supplemented by an attach method (which simply sets the sink attribute)?
>What sounds more Pythonic?
>
>Thoughts?
>
Below is a preliminary sketch, perhaps to consider? IMO the base filter class
should provide as much as possible for free, and and instance should be able to play
either source or sink roles. E.g., I think you should be able to write a filter
class with only a read or write method, and have the base class be able to make
readline or readlines, etc. for you if you don't provide it.

I'm thinking to pass all parameters to filter constructors as keyword parameters for
simple default storage as self.kw by the base class __init__, so in many cases
you could even just inherit the __init__ and access what you need via
self.kw['your_special_filter_parameter'] or the standardly cached self.dst or self.src
which reflect self.kw['src'] and/or self.kw['dst'] (see code sketch below).

Usage could be spelled many ways, but as an example, to sort, filter out duplicate lines,
and print the last 20 you'd write:

    print Tail(num=20,src=Uniq(src=Sort(src=file('somefile.txt')))).read()

or the other direction

    print >> Sort(dst=Uniq(dst=Tail(num=20,dst=file('somefile.txt','w'))), stuff,to,be,processed

which is using that ugly print thing, but of course you could use the outermost write, e.g.,

    Sort(dst=Uniq(dst=Tail(num=20,dst=file('somefile.txt','w'))).write(stuff_to_be_processed)

(Of course this assumes Tail, Uniq, and Sort classes are available. Note that you could
reuse much of the functionality implementation to make the classes support filtering
in both directions).

which I think could be done with something like the (untested obviously) sketch further below.

BTW, ISTM there is also a pipeline buffering issue. I.e., one could imagine a base class that could
support chunky data passing with status also passing through to allow non-blocking polling at
the end points. But I haven't thought of all the ramifications. Thread alternatives would have
to be thought through. I guess I'll wait for reactions ;-)

    class Filter(object):
        def __init__(self, **kw):
            self.kw = kw
            src = kw.get('src')      # don't bind to None, so as to raise exception if accessed:
            if src: self.src = src   # AttributeError: '<your class name>' object has no attribute 'src'            
            self.dst = kw.get('dst')
            if dst: self.dst = dst   # similarly
            self.excessread = ''     # for buffering readline remainders
            self.excesswrite= ''     # for allowing partial writes

    # possibly line input iteration could be inherited?
    def __iter__(self): return self         # XXX ??
    def next(self): return self.readline()  # XXX ??

    def read(self, n=-1):
        raise NotImplementedError    # I.e., must be overridden for source filters
        # XXX ?? how to indicate no data now, but not EOF?
        #        Should we have a readNoWait that can return (status, data) tuple?

    def readline(self, size=-1):
        # (optionally overrideable, but actually implement readline here via self.read).

    def readlines(self, sizehint=None):
        # (optionally overrideable, but actually implement readlines here via self.readline).

    def write(self, data):
        raise NotImplementedError    # I.e., must be overridden for destination filters
        # XXX ?? should write functions return a measure of how much was actually written
        #        so pipelining can be done with flexible chunking?
        #        or maybe a separate writeNoWait that returns (status, nwritten) ??

    def writelines(self, lines):
        # (optionally overrideable, but actually implement writelines here via self.write).

    def flush(self):
        # XXX flush via whatever methods are available ??
        for io in ('src', 'dst'):
            fio = self.kw.get(io)
            if fio: getattr(fio, 'flush', lambda: None)()
            
    def close(self):
        # XXX close via whatever methods are available ??
        for io in ('src', 'dst'):
            fio = self.kw.get(io)
            if fio: getattr(fio, 'close', lambda: None)()

    def isatty(self): return 0        # ?? XXX


Regards,
Bengt Richter



More information about the Python-list mailing list