[Python-bugs-list] [ python-Bugs-453059 ] Nasty bug in HTMLParser.py
noreply@sourceforge.net
noreply@sourceforge.net
Mon, 20 Aug 2001 01:16:36 -0700
Bugs item #453059, was opened at 2001-08-19 13:41
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=453059&group_id=5470
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Chris Withers (fresh)
>Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Nasty bug in HTMLParser.py
Initial Comment:
If you feed the following string to an HTMLParser
parser, you get _very_ weird results:
'one & two & three &three; &blagh ;'
What I would expect would be:
- call to handle_data(data='one & two & three ')
- call to handle_entityref(name='three')
- call to handle_data(data=' &blagh ;')
What you actually get is:
- call to handle_data(data='one ')
- call to handle_data(data='one ')
...which is very wrong :-S
Now, I'm not sure of the validity of the associated
HTML*, but if it's invalid, I would have thought
exceptions would be thrown rather than the above result.
In any case, I have a module that demonstrates this
problem which is available from:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/squishdot/stripogram/
It has a testsuite that runs with Zope's testrunner.py
and I just added a test to demonstrate this problem.
Any help would be very much appreciated...
Chris
* The string 'one & two & three &three; &blagh ;'
displays exactly as is in Mozilla, IE and Netscape, of
course that doesn't mean the W3C will like it ;-) I'd
prefer to go with the majority rather than being
'right' on this one.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=453059&group_id=5470