[ python-Bugs-1117302 ] sgmllib.SGMLParser
SourceForge.net
noreply at sourceforge.net
Tue Feb 8 09:03:36 CET 2005
Bugs item #1117302, was opened at 2005-02-06 15:04
Message generated for change (Comment added) made by effbot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1117302&group_id=5470
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Paul Birnie (pbirnie)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmllib.SGMLParser
Initial Comment:
sgmllib.SGMLParser calls start tag and end_methods
correctly until it encounters
<a title="link1" href="url1">One</a>
<br/><a title="link2" href="someurl2">Two</a>
<a title="link2" href="url3">Three</a>
the <br/> seems to cause its parsing to become
confused and I conly get call backs for tag a twice (link
1 and 3)
----------------------------------------------------------------------
>Comment By: Fredrik Lundh (effbot)
Date: 2005-02-08 09:03
Message:
Logged In: YES
user_id=38376
footnote 2: if you need to deal with broken HTML, use
TidyLib:
http://utidylib.berlios.de/
http://effbot.org/zone/element-tidylib.htm
----------------------------------------------------------------------
Comment By: Fredrik Lundh (effbot)
Date: 2005-02-08 09:01
Message:
Logged In: YES
user_id=38376
footnote: <br/> is an XML construct, and is not valid HTML.
In HTML, "<tag/blah/" is short for "<tag>blah</tag>", so the
BR section is parsed as
START br
DATA ><a title="link2" href="someurl2">Two<
END br
DATA a>
which is 100% correct. For more on this topic, see:
http://www.cs.tut.fi/~jkorpela/html/empty.html
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1117302&group_id=5470
More information about the Python-bugs-list
mailing list