BeautifulSoup bug when ">>>" found in attribute value

John Nagle nagle at animats.com
Tue Dec 26 16:36:14 EST 2006


This, which is from a real web site, went into BeautifulSoup:

<param name="movie" value="/images/offersBanners/sw04.swf?binfot=We offer
fantastic rates for selected weeks or days!!&blinkt=Click here
>>>&linkurl=/Europe/Spain/Madrid/Apartments/Offer/2408" />

And this came out, via prettify:

<addresssnippet siteurl="http%3A//apartmentsapart.com" 
url="http%3A//www.apartmentsapart.com/Europe/Spain/Madrid/FAQ">
     <param name="movie" value="/images/offersBanners/sw04.swf?binfot=We offer 
fantastic rates for selected weeks or days!!&blinkt=Click here 
>>>&linkurl=/Europe/Spain/Madrid/Apartments/Offer/2408">
 >>&linkurl;=/Europe/Spain/Madrid/Apartments/Offer/2408" />
</param>

BeautifulSoup seems to have become confused by the ">>>" within
a quoted attribute value.  It first parsed it right, but then stuck
in an extra, totally bogus line.  Note the entity "&linkurl;", which
appears nowhere in the original.  It looks like code to handle a missing
quote mark did the wrong thing.

				John Nagle




More information about the Python-list mailing list