BeautifulSoup bug when ">>>" found in attribute value

Anne van Kesteren annevankesteren at gmail.com
Thu Dec 28 04:02:41 EST 2006


Duncan Booth schreef:
> The /> was in the original input that you gave it:
>
> <param name="movie" value="/images/offersBanners/sw04.swf?binfot=We
> offer fantastic rates for selected weeks or days!!&blinkt=Click here
> >>>&linkurl=/Europe/Spain/Madrid/Apartments/Offer/2408" />
>
> You don't actually *have* to escape > when it appears in html.

You don't have to escape it in XML either, except when it's preceded by
]].


> As I said before, it looks like BeautifulSoup decided that the tag ended
> at the first > although it took text beyond that up to the closing " as
> the value of the attribute. The remaining text was then simply treated
> as text content of the unclosed param tag. Finally it inserted a
> </param> to close the unclosed param tag.

The param element doesn't have a closing tag.

http://www.w3.org/TR/html401/struct/objects.html#h-13.3.2


> Mind you, the sentence before that says 'should' for quoting < characters
> which is just plain silly.

For quoted attribute values it isn't silly at all. It's actually part
of how HTML works.


--
 Anne van Kesteren
 <http://annevankesteren.nl/>
 <http://www.opera.com/>




More information about the Python-list mailing list