SGMLParser bug? can't parse <br/>

Fredrik Lundh fredrik at pythonware.com
Thu May 15 14:30:33 EDT 2003


Tung Wai Yip wrote:

> I try to use sgmllib.SGMLParser to parse the following HTML
>
> ----------------
> <html>
> <body>
>   <br/>
> </body>
> </html>
> ----------------
>
> The output is rather messed up.
>
> start tag: <html>
> data: '\r\n'
> start tag: <body>
> data: '\r\n  '
> start tag: <br>
> data: '>\r\n<'            <-- mess up
> end tag: </br>
> data: 'body>\r\n'         <-- mess up
> end tag: </html>
> data: '\r\n'
>
> No problem if I use <br> instead of <br/>.

or "<br />", which is the recommended way to use this XML/XHTML
feature in HTML:

    http://www.w3.org/TR/xhtml1/#guidelines

for more on empty elements in HTML, see:

    http://www.cs.tut.fi/~jkorpela/html/empty.html

(if I understand things correctly, sgmllib should really produce
<br> followed by </br> followed by character data ">" (!))

> Is there any place to report Python library bugs?

click on "bugs" on the www.python.org home page.  for more
information, see the developer faq:

    http://www.python.org/dev/devfaq.html#bugs

</F>








More information about the Python-list mailing list