sgmllib.py not good at handling <br/>

Chris Withers chrisw at nipltd.com
Mon May 14 10:56:21 EDT 2001


Chris Withers wrote:
> 
> Gilles Lenfant wrote:
> >
> > Hmmm.
> >
> > Aren't constructs like <tag/> a XML specific feature for empty elements ?
> > Your sample is XHTML (HTML from XML) rather than traditional HTML (from
> > SGML).
> > AFAIK, SGML empty elements don't need the trailing "/".
> > Try to use xmllib in place of sgmllib (your code will perhaps need some
> > rework).
> 
> So is SGML a subset of XML?
> 
> This code is for my HTML filtering module:
> http://www.zope.org/Members/chrisw/StripOGram

Damn keyboard ;-)

Anyway, my main concern is preventing people smuggling dodgy tags through like
so:
> 
>   html2safehtml ('Roses <b>are</B> red,<br/<blink>QUACK<//blink> violets '
>                  '<i>are</i> blue', 
>                  valid_tags=['b','i','br'])
> 
> successfully smuggling a <blink>...</blink> inside the result:
> 
>        'Roses <b>are</b> red,<br><blink>QUACK</blink> violets <i>are</i> blue'
> 
> (Notice that the closing '</i>' is now OK again, and that I had to use
> '<//blink>' in order to get '</blink>'.

Would xmllib.py be the way to go for this? How fast is that compared to
sgmllib.py?

cheers,

Chris




More information about the Python-list mailing list