HTML parsing bug?

Richard Brodie R.Brodie at rl.ac.uk
Mon Jan 30 09:59:00 EST 2006


<g_no_mail_please at yahoo.com> wrote in message 
news:1138632328.306349.241430 at g44g2000cwa.googlegroups.com...

> Python 2.3.5 seems to choke when trying to parse html files, because it
> doesn't realize that what's inside <!--      --> is a comment in HTML,
> even if this comment is inside <script> </script>, especially if it's a
> comment inside that script code too.

Actually, you are technically incorrect;  try validating the code you posted.
Google found this explanation: http://lachy.id.au/log/2005/05/script-comments
Feeding even slightly invalid HTML to the standard library parser will often
choke it. If you can't guarantee clean sources, best use Tidy first or another
parser entirely.






More information about the Python-list mailing list