sgmllib.py not good at handling <br/>

Gilles Lenfant glenfant.nospam at bigfoot.com
Mon May 14 09:51:33 EDT 2001


Hmmm.

Aren't constructs like <tag/> a XML specific feature for empty elements ?
Your sample is XHTML (HTML from XML) rather than traditional HTML (from
SGML).
AFAIK, SGML empty elements don't need the trailing "/".
Try to use xmllib in place of sgmllib (your code will perhaps need some
rework).

"Chris Withers" <chrisw at nipltd.com> a écrit dans le message news:
mailman.989843000.26141.python-list at python.org...
> Hi,
>
> I posted this to the bug Tracker:
>
http://sourceforge.net/tracker/?func=detail&aid=423779&group_id=5470&atid=10
5470
>
> ...but it's holding me up badly so I thought I'd ask here too in the hope
that
> one of you kind souls can help out :-)
>
> When parsing the following HTML:
>
> 'Roses <b>are</B> red,<br/>violets <i>are</i> blue'
>
> ...with the following class:
>
> class HTML2SafeHTML(sgmllib.SGMLParser):
>
> def handle_data(self, data):
> print "***data***"
> print data
>
> def unknown_starttag(self, tag, attrs):
> print "***start**"
> print tag
> print (attrs)
>
> def unknown_endtag(self, tag):
> print "***end**"
> print tag
>
> I get the following output, which isn't right :-S
>
> ***data***
> Roses
> ***start**
> b
> []
> ***data***
> are
> ***end**
> b
> ***data***
> red,
> ***start**
> br
> []
> ***data***
> >violets <i>are<
> ***end**
> br
> ***data***
> i> blue
>
> Any idea what's broken, where and how to fix it? I get the same with the
> htmllib.py in both python 1.5.2, 2.0 and the latest from CVS.
>
> cheers,
>
> Chris
>





More information about the Python-list mailing list