HTML cleaner?
Terry Hancock
hancock at anansispaceworks.com
Mon Apr 25 22:15:13 EDT 2005
On Sunday 24 April 2005 06:25 pm, Ivan Voras wrote:
> Is there a HTML clean/tidy library or module written in pure python? I
> found mxTidy, but it's a interface to command-line tool.
>
> What I'm searching is something that will accept a list of allowed tags
> and/or attributes and strip the rest from HTML string.
I'm using stripogram for this. It uses a whitelist approach, where you
tell it what tags to accept. It also has a function for getting text only.
http://www.zope.org/Members/chrisw/StripOGram/
It's very useful in Zope, but is actually an independent pure-python
module (you don't need Zope to use it). Also very small.
--
Terry Hancock ( hancock at anansispaceworks.com )
Anansi Spaceworks http://www.anansispaceworks.com
More information about the Python-list
mailing list