BeautifulSoup vs. Microsoft

Duncan Booth duncan.booth at invalid.invalid
Thu Mar 29 07:52:21 EDT 2007


"Justin Ezequiel" <justin.mailinglists at gmail.com> wrote:

> On Mar 29, 4:08 pm, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
>> John Nagle <n... at animats.com> wrote:
>> >      title="<!--http://www.microsoft.com/usability/information.mspx->"
>>
>> > is supposed to be an HTML comment.  But it's improperly terminated.
>>
>> It is an attribute value, and unescaped angle brackets are valid in
>> attributes. It looks to me like a bug in BeautifulSoup.
> 
> FWIW, see http://tinyurl.com/yjtzjz
> 
> new fan of BeautifulSoup here as it helped me parse "BAD" XML
> (although my client would disagree with that description)
> 
I'm right behind BeautifulSoup's ability to parse bad HTML, but I still 
think it should give priority to being able to parse valid HTML withough 
messing it up.



More information about the Python-list mailing list