Removing tags from a html-file

Siggy Brentrup bsb at winnegan.de
Mon Jan 17 16:12:20 EST 2000


Thomas Weholt <thomas at bibsyst.no> writes:

> Hi,
> 
> I want to remove all tags from a html-file, so that only the plain-text
> remains. How, if so, can this be done using htmllib.py?

The following should do the trick

from htmllib import HTMLParser
from formatter import AbstractFormatter, DumbWriter

parser = HTMLParser(AbstractFormatter(DumbWriter()))
parser.feed(YOUR_DATA)

Instatiated w/o arguments, DumbWriter writes to stdout.


 - Siggy

-- 
Siggy Brentrup - bsb at winnegan.de - http://www.winnegan.de/
                  bsb at north.de - http://www.north.de/~bsb/
****** ceterum censeo javascriptum esse restrictam *******




More information about the Python-list mailing list