[Python-bugs-list] [ python-Bugs-500073 ] HTMLParser fail to handle '&foobar'

noreply@sourceforge.net noreply@sourceforge.net
Sun, 06 Jan 2002 00:06:23 -0800


Bugs item #500073, was opened at 2002-01-06 00:06
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=500073&group_id=5470

Category: Extension Modules
Group: Python 2.1.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Bernard YUE (berniey)
Assigned to: Nobody/Anonymous (nobody)
Summary: HTMLParser fail to handle '&foobar'

Initial Comment:
HTMLParser did not distingish between &foobar; and 
&foobar.  The later is still considered as a 
charref/entityref.  Below is my posposed fix:

File:  sgmllib.py

# SGMLParser.goahead()
# line 162-176
# from
            elif rawdata[i] == '&':
                match = charref.match(rawdata, i)
                if match:
                    name = match.group(1)
                    self.handle_charref(name)
                    i = match.end(0)
                    if rawdata[i-1] != ';': i = i-1
                    continue
                match = entityref.match(rawdata, i)
                if match:
                    name = match.group(1)
                    self.handle_entityref(name)
                    i = match.end(0)
                    if rawdata[i-1] != ';': i = i-1
                    continue

# to
            elif rawdata[i] == '&'
                match = charref.match(rawdata, i)
                if match:
                    if rawdata[match.end(0)-1] != ';':
                        # not really an charref
                        self.handle_data(rawdata[i])
                        i = i+1
                    else:
                        name = match.group(1)
                        self.handle_charref(name)
                        i = match.end(0)
                    continue
                match = entityref.match(rawdata, i)
                if match:
                    if rawdata[match.end(0)-1] != ';':
                        # not really an entitiyref
                        self.handle_data(rawdata[i])
                        i = i+1
                    else: 
                        name = match.group(1)
                        self.handle_entityref(name)
                        i = match.end(0)
                    continue



----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=500073&group_id=5470