sgmllib.py not good at handling <br/>
Chris Withers
chrisw at nipltd.com
Mon May 14 10:53:23 EDT 2001
Gilles Lenfant wrote:
>
> Hmmm.
>
> Aren't constructs like <tag/> a XML specific feature for empty elements ?
> Your sample is XHTML (HTML from XML) rather than traditional HTML (from
> SGML).
> AFAIK, SGML empty elements don't need the trailing "/".
> Try to use xmllib in place of sgmllib (your code will perhaps need some
> rework).
So is SGML a subset of XML?
This code is for my HTML filtering module:
http://www.zope.org/Members/chrisw/StripOGram
>
> "Chris Withers" <chrisw at nipltd.com> a écrit dans le message news:
> mailman.989843000.26141.python-list at python.org...
> > Hi,
> >
> > I posted this to the bug Tracker:
> >
> http://sourceforge.net/tracker/?func=detail&aid=423779&group_id=5470&atid=10
> 5470
> >
> > ...but it's holding me up badly so I thought I'd ask here too in the hope
> that
> > one of you kind souls can help out :-)
> >
> > When parsing the following HTML:
> >
> > 'Roses <b>are</B> red,<br/>violets <i>are</i> blue'
> >
> > ...with the following class:
> >
> > class HTML2SafeHTML(sgmllib.SGMLParser):
> >
> > def handle_data(self, data):
> > print "***data***"
> > print data
> >
> > def unknown_starttag(self, tag, attrs):
> > print "***start**"
> > print tag
> > print (attrs)
> >
> > def unknown_endtag(self, tag):
> > print "***end**"
> > print tag
> >
> > I get the following output, which isn't right :-S
> >
> > ***data***
> > Roses
> > ***start**
> > b
> > []
> > ***data***
> > are
> > ***end**
> > b
> > ***data***
> > red,
> > ***start**
> > br
> > []
> > ***data***
> > >violets <i>are<
> > ***end**
> > br
> > ***data***
> > i> blue
> >
> > Any idea what's broken, where and how to fix it? I get the same with the
> > htmllib.py in both python 1.5.2, 2.0 and the latest from CVS.
> >
> > cheers,
> >
> > Chris
> >
>
> --
> http://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list