SGMLParser bug? can't parse <br/>
Fredrik Lundh
fredrik at pythonware.com
Thu May 15 14:30:33 EDT 2003
Tung Wai Yip wrote:
> I try to use sgmllib.SGMLParser to parse the following HTML
>
> ----------------
> <html>
> <body>
> <br/>
> </body>
> </html>
> ----------------
>
> The output is rather messed up.
>
> start tag: <html>
> data: '\r\n'
> start tag: <body>
> data: '\r\n '
> start tag: <br>
> data: '>\r\n<' <-- mess up
> end tag: </br>
> data: 'body>\r\n' <-- mess up
> end tag: </html>
> data: '\r\n'
>
> No problem if I use <br> instead of <br/>.
or "<br />", which is the recommended way to use this XML/XHTML
feature in HTML:
http://www.w3.org/TR/xhtml1/#guidelines
for more on empty elements in HTML, see:
http://www.cs.tut.fi/~jkorpela/html/empty.html
(if I understand things correctly, sgmllib should really produce
<br> followed by </br> followed by character data ">" (!))
> Is there any place to report Python library bugs?
click on "bugs" on the www.python.org home page. for more
information, see the developer faq:
http://www.python.org/dev/devfaq.html#bugs
</F>
More information about the Python-list
mailing list