Simple allowing of HTML elements/attributes?

Graham Fawcett graham__fawcett at hotmail.com
Wed Feb 11 23:58:48 EST 2004


cookedm+news at physics.mcmaster.ca (David M. Cooke) wrote in message news:<qnklln9grv6.fsf at arbutus.physics.mcmaster.ca>...
> At some point, Leif K-Brooks <eurleif at ecritters.biz> wrote:
> 
> > I'm writing a site with mod_python which will have, among other
> > things, forums. I want to allow users to use some HTML (<em>,
> > <strong>, <p>, etc.) on the forums, but I don't want to allow bad
> > elements and attributes (onclick, <script>, etc.). I would also like
> > to do basic validation (no overlapping elements like
> > <strong><em>foo</em></strong>, no missing end tags). I'm not asking
> > anyone to write a script for me, but does anyone have general ideas
> > about how to do this quickly on an active forum?
> 
> You could require valid XML, and use a validating XML parser to
> check conformance. You'd have to make sure the output is correctly
> quoted (for instance, check that HTML tags in a CDATA block get quoted).

You could use Tidy (or tidylib) to convert error-ridden input into
valid HTML or XHTML, and then grab the BODY contents via an XML
parser, as David suggested. I imagine that the library version of tidy
is quick enough to meet your needs.

Or maybe you could use XSLT to cut the "bad stuff" out of your tidied
XHTML. (Not something I'm familiar with, but someone must have done
this before.)

There's a Python wrapper for tidylib at
http://utidylib.sourceforge.net/ .

-- Graham



More information about the Python-list mailing list