HTML cleaner?
Leif K-Brooks
eurleif at ecritters.biz
Mon Apr 25 06:23:43 EDT 2005
Ivan Voras wrote:
> Is there a HTML clean/tidy library or module written in pure python? I
> found mxTidy, but it's a interface to command-line tool.
>
> What I'm searching is something that will accept a list of allowed tags
> and/or attributes and strip the rest from HTML string.
Here's a module I wrote to do something along the lines of what you
want: <http://ecritters.biz/limithtml.py>. Unfortunately, it requires
the HTML to be relatively well-formed (e.g. it doesn't like things like
"<i><b>foo</i></b>"), so I feed the HTML into uTidyLib (another
interface to HTML Tidy) first. I'm not sure why you don't want to use
Tidy, but if you do change your mind, you should be able to use my
module alongside Tidy to limit the HTML elements and attributes which
will be accepted.
More information about the Python-list
mailing list