BeautifulSoup vs. Microsoft

John Nagle nagle at animats.com
Thu Mar 29 12:54:55 EDT 2007


Duncan Booth wrote:
> John Nagle <nagle at animats.com> wrote:
> 
> 
>>Strictly speaking, it's Microsoft's fault.
>>
>>     title="<!--http://www.microsoft.com/usability/information.mspx->"
>>
>>is supposed to be an HTML comment.  But it's improperly terminated.
>>It should end with "-->".  So all that following stuff is from what
>>follows the next "-->" which terminates a comment.
> 
> 
> It is an attribute value, and unescaped angle brackets are valid in 
> attributes. It looks to me like a bug in BeautifulSoup.

    I think you're right.  The HTML 4 spec,

	http://www.w3.org/TR/html4/intro/sgmltut.html

says "Note that comments are markup".  So recognizing comment syntax
inside an attribute is, in fact, an error in BeautifulSoup.

    The source HTML on the Microsoft page is thus syntactically correct,
although meaningless.  That's the only place on that page with a
comment-type form in an attribute.

				John Nagle



More information about the Python-list mailing list