formatter output to list

Justin Shaw wyojustin at hotmail.com
Tue May 21 20:50:19 EDT 2002


Seems like option 2 is the easiest.  You don't need to know any internals of
AbstractWriter.  It looks like the writer passes pretty small chunks.  Here
is the barebones example.

import htmllib
import formatter
class Catcher:
    def __init__(self):
        self.lines = []
    def write(self, line):
        self.lines.append(line)
    def __getitem__(self, index):
        return self.lines[index]

def html2txt( fh ):
    oh = Catcher()
    p = htmllib.HTMLParser(
        formatter.AbstractFormatter(formatter.DumbWriter(oh)))
    p.feed(fh.read())
    return oh

def __test__():
    txt = html2txt(open(r'c:\temp\junk.html'))
    for line in txt:
        print line,


"John Hunter" <jdhunter at nitace.bsd.uchicago.edu> wrote in message
news:m2u1p11nor.fsf at mother.paradise.lost...
>
> I have a urlopen file object that I am passing to formatter DumbWriter
> to strip the html
>
> def html2txt( fh ):
>     oh = open('temp.out', 'w')
>     p = htmllib.HTMLParser(
>         formatter.AbstractFormatter(formatter.DumbWriter(oh)))
>     p.feed(fh.read())
>
> I am then doing some post processing on the file 'temp.out'.
>
> Rather than communicating via the file 'temp.out', I want the
> DumbWriter to return a list of lines.  I see two solutions: derive a
> new class from AbstractWriter or pass a list like object which
> implements the necessary file object methods to html2txt and have that
> func return the modified object.
>
> Suggestions?
> Thanks,
> John Hunter





More information about the Python-list mailing list