Removing tags from a html-file
Siggy Brentrup
bsb at winnegan.de
Mon Jan 17 16:12:20 EST 2000
Thomas Weholt <thomas at bibsyst.no> writes:
> Hi,
>
> I want to remove all tags from a html-file, so that only the plain-text
> remains. How, if so, can this be done using htmllib.py?
The following should do the trick
from htmllib import HTMLParser
from formatter import AbstractFormatter, DumbWriter
parser = HTMLParser(AbstractFormatter(DumbWriter()))
parser.feed(YOUR_DATA)
Instatiated w/o arguments, DumbWriter writes to stdout.
- Siggy
--
Siggy Brentrup - bsb at winnegan.de - http://www.winnegan.de/
bsb at north.de - http://www.north.de/~bsb/
****** ceterum censeo javascriptum esse restrictam *******
More information about the Python-list
mailing list