HTML filtering

Gerhard Häring gh_pythonlist at gmx.de
Wed May 1 15:27:54 EDT 2002


* Stuart D. Gathman <stuart at bmsi.com> [2002-05-01 19:06 +0000]:
> I need to filter HTML to remove certain constructs (e.g. <script ...> ...
> </script>).  [...]

I had a similar problem once (this was even my first post to this
newsgroup, IIRC). I was told to look at these scripts:

http://cgi.algonet.se/htbin/cgiwrap/ug/show.py?script=ripurl.py

http://cgi.algonet.se/htbin/cgiwrap/ug/show.py?script=sgmlecho.py

Basically, use sgmllib instead of htmllib and override the unknown_*
methods.

HTH,

Gerhard
-- 
This sig powered by Python!
Außentemperatur in München: 15.2 °C      Wind: 2.7 m/s





More information about the Python-list mailing list