HTML filtering
Gerhard Häring
gh_pythonlist at gmx.de
Wed May 1 15:27:54 EDT 2002
* Stuart D. Gathman <stuart at bmsi.com> [2002-05-01 19:06 +0000]:
> I need to filter HTML to remove certain constructs (e.g. <script ...> ...
> </script>). [...]
I had a similar problem once (this was even my first post to this
newsgroup, IIRC). I was told to look at these scripts:
http://cgi.algonet.se/htbin/cgiwrap/ug/show.py?script=ripurl.py
http://cgi.algonet.se/htbin/cgiwrap/ug/show.py?script=sgmlecho.py
Basically, use sgmllib instead of htmllib and override the unknown_*
methods.
HTH,
Gerhard
--
This sig powered by Python!
Außentemperatur in München: 15.2 °C Wind: 2.7 m/s
More information about the Python-list
mailing list