[Python-bugs-list] [ python-Bugs-500073 ] HTMLParser fail to handle '&foobar'
noreply@sourceforge.net
noreply@sourceforge.net
Sun, 06 Jan 2002 00:06:23 -0800
Bugs item #500073, was opened at 2002-01-06 00:06
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=500073&group_id=5470
Category: Extension Modules
Group: Python 2.1.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Bernard YUE (berniey)
Assigned to: Nobody/Anonymous (nobody)
Summary: HTMLParser fail to handle '&foobar'
Initial Comment:
HTMLParser did not distingish between &foobar; and
&foobar. The later is still considered as a
charref/entityref. Below is my posposed fix:
File: sgmllib.py
# SGMLParser.goahead()
# line 162-176
# from
elif rawdata[i] == '&':
match = charref.match(rawdata, i)
if match:
name = match.group(1)
self.handle_charref(name)
i = match.end(0)
if rawdata[i-1] != ';': i = i-1
continue
match = entityref.match(rawdata, i)
if match:
name = match.group(1)
self.handle_entityref(name)
i = match.end(0)
if rawdata[i-1] != ';': i = i-1
continue
# to
elif rawdata[i] == '&'
match = charref.match(rawdata, i)
if match:
if rawdata[match.end(0)-1] != ';':
# not really an charref
self.handle_data(rawdata[i])
i = i+1
else:
name = match.group(1)
self.handle_charref(name)
i = match.end(0)
continue
match = entityref.match(rawdata, i)
if match:
if rawdata[match.end(0)-1] != ';':
# not really an entitiyref
self.handle_data(rawdata[i])
i = i+1
else:
name = match.group(1)
self.handle_entityref(name)
i = match.end(0)
continue
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=500073&group_id=5470