Easy way to remove HTML entities from an HTML document?

Robert Brewer fumanchu at amor.org
Sun Jul 25 16:21:18 EDT 2004


Robert Oschler wrote:
> Is there a module/function to remove all the HTML entities 
> from an HTML document (e.g. - &nbsp, &amp, &apos, etc.)?

Grab cleanhtml.py from the bottom of
http://www.aminus.org/rbre/python/index.html -- you should be able to
quickly rewrite the Plaintext class and just limit it to replacing (or
removing) entities--at least the regex is already written for you.

HTH!


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org



More information about the Python-list mailing list