Easy way to remove HTML entities from an HTML document?
Robert Brewer
fumanchu at amor.org
Sun Jul 25 16:21:18 EDT 2004
Robert Oschler wrote:
> Is there a module/function to remove all the HTML entities
> from an HTML document (e.g. -  , &, &apos, etc.)?
Grab cleanhtml.py from the bottom of
http://www.aminus.org/rbre/python/index.html -- you should be able to
quickly rewrite the Plaintext class and just limit it to replacing (or
removing) entities--at least the regex is already written for you.
HTH!
Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org
More information about the Python-list
mailing list