sgmllib.py not good at handling <br/>

Chris Withers chrisw at nipltd.com
Mon May 14 10:53:23 EDT 2001


Gilles Lenfant wrote:
> 
> Hmmm.
> 
> Aren't constructs like <tag/> a XML specific feature for empty elements ?
> Your sample is XHTML (HTML from XML) rather than traditional HTML (from
> SGML).
> AFAIK, SGML empty elements don't need the trailing "/".
> Try to use xmllib in place of sgmllib (your code will perhaps need some
> rework).

So is SGML a subset of XML?

This code is for my HTML filtering module:
http://www.zope.org/Members/chrisw/StripOGram

> 
> "Chris Withers" <chrisw at nipltd.com> a écrit dans le message news:
> mailman.989843000.26141.python-list at python.org...
> > Hi,
> >
> > I posted this to the bug Tracker:
> >
> http://sourceforge.net/tracker/?func=detail&aid=423779&group_id=5470&atid=10
> 5470
> >
> > ...but it's holding me up badly so I thought I'd ask here too in the hope
> that
> > one of you kind souls can help out :-)
> >
> > When parsing the following HTML:
> >
> > 'Roses <b>are</B> red,<br/>violets <i>are</i> blue'
> >
> > ...with the following class:
> >
> > class HTML2SafeHTML(sgmllib.SGMLParser):
> >
> > def handle_data(self, data):
> > print "***data***"
> > print data
> >
> > def unknown_starttag(self, tag, attrs):
> > print "***start**"
> > print tag
> > print (attrs)
> >
> > def unknown_endtag(self, tag):
> > print "***end**"
> > print tag
> >
> > I get the following output, which isn't right :-S
> >
> > ***data***
> > Roses
> > ***start**
> > b
> > []
> > ***data***
> > are
> > ***end**
> > b
> > ***data***
> > red,
> > ***start**
> > br
> > []
> > ***data***
> > >violets <i>are<
> > ***end**
> > br
> > ***data***
> > i> blue
> >
> > Any idea what's broken, where and how to fix it? I get the same with the
> > htmllib.py in both python 1.5.2, 2.0 and the latest from CVS.
> >
> > cheers,
> >
> > Chris
> >
> 
> --
> http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list