"<![CDATA[]]" vs. BeautifulSoup

John Nagle nagle at animats.com
Thu May 3 15:59:00 EDT 2012


   An HTML page for a major site (http://www.chase.com) has
some incorrect HTML.  It contains

	<![CDATA[]]

which is not valid HTML, XML, or SMGL.  However, most browsers
ignore it.  BeautifulSoup treats it as the start of a CDATA section,
and consumes the rest of the document in CDATA format.

   Bug?

					John Nagle



More information about the Python-list mailing list