[issue5498] Can SGMLParser properly handle <empty/> tags?
Éric Araujo
report at bugs.python.org
Wed Jan 27 14:40:38 CET 2010
Éric Araujo <merwok at netwok.org> added the comment:
Hello
XML of the form <tag/> are an SGML hack, or more precisely the combination of two features of SGML. The forward slash closes the tag, and the following angle bracket is character data, not part of the tag.
The W3C validator uses a real SGML parser for HTML doctypes, and fails on XML-like /> constructs: http://validator.w3.org/check?uri=data%3Atext%2Fhtml%2C%3C!DOCTYPE+html+PUBLIC+%22-%2F%2FW3C%2F%2FDTD+HTML+4.01%2F%2FEN%22+%22http%3A%2F%2Fwww.w3.org%2FTR%2Fhtml4%2Fstrict.dtd%22%3E+%3Chtml%3E+%3Chead%3E+++%3Ctitle%3ETest%3C%2Ftitle%3E+++%3Cmeta+name%3Dtest+content%3Done%2F%3E+++%3Cmeta+name%3Dbug+content%3Dtwo%3E+%3C%2Fhead%3E+%3Cbody%3E+++%3Cp%3ETest%3C%2Fp%3E+%3C%2Fbody%3E+%3C%2Fhtml%3E&charset=%28detect+automatically%29&doctype=Inline&group=1&verbose=1
The complete explanation can be read at http://www.cs.tut.fi/~jkorpela/html/empty.html
In conclusion, sgmllib is right. Use an XML parser for XML or an HTML5 parser for HTML.
Kind regards
----------
nosy: +Merwok
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5498>
_______________________________________
More information about the Python-bugs-list
mailing list