HTML cleaner?

Terry Hancock hancock at anansispaceworks.com
Mon Apr 25 22:15:13 EDT 2005


On Sunday 24 April 2005 06:25 pm, Ivan Voras wrote:
> Is there a HTML clean/tidy library or module written in pure python? I 
> found mxTidy, but it's a interface to command-line tool.
> 
> What I'm searching is something that will accept a list of allowed tags 
> and/or attributes and strip the rest from HTML string.

I'm using stripogram for this. It uses a whitelist approach, where you
tell it what tags to accept. It also has a function for getting text only.

http://www.zope.org/Members/chrisw/StripOGram/

It's very useful in Zope, but is actually an independent pure-python
module (you don't need Zope to use it).  Also very small.

--
Terry Hancock ( hancock at anansispaceworks.com )
Anansi Spaceworks  http://www.anansispaceworks.com




More information about the Python-list mailing list