[New-bugs-announce] [issue41489] HTMLParser : HTMLParser.error creating multiple errors.

AbcSxyZ report at bugs.python.org
Wed Aug 5 16:04:25 EDT 2020


New submission from AbcSxyZ <rossi.sim at outlook.com>:

Coming from deprecated feature. Using python 3.7.3

Related and probably fixed with https://bugs.python.org/issue31844
Just in case.

I've got 2 different related problems, the first one creating the second.

Using linked file and this class :
```
from html.parser import HTMLParser

class LinkParser(HTMLParser):
    """ DOM parser to retrieve href of all <a> elements """

    def parse_links(self, html_content):
        self.links = []
        self.feed(html_content)
        return self.links

    def handle_starttag(self, tag, attrs):
        if tag == "a":
            attrs = {key.lower():value for key, *value in attrs}
            urls = attrs.get("href", None)
            if urls and urls[0]:
                self.links.append(urls[0])

    # def error(self, *args, **kwargs):
    #     pass

if __name__ == "__main__":
    with open("error.txt") as File:
        LinkParser().parse_links(File.read())

```

With error method commented, it creates :
```
  File "scanner/link.py", line 8, in parse_links                                                                                                                        
    self.feed(html_content)                                                                                                                                             
  File "/usr/lib/python3.7/html/parser.py", line 111, in feed                                                                                                           
    self.goahead(0)
  File "/usr/lib/python3.7/html/parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  File "/usr/lib/python3.7/html/parser.py", line 264, in parse_html_declaration
    return self.parse_marked_section(i)
  File "/usr/lib/python3.7/_markupbase.py", line 159, in parse_marked_section
    self.error('unknown status keyword %r in marked section' % rawdata[i+3:j])
  File "/usr/lib/python3.7/_markupbase.py", line 34, in error
    "subclasses of ParserBase must override error()")
NotImplementedError: subclasses of ParserBase must override error()
```

If error method do not raise anything, using only pass, it creates :
```
  File "/home/simon/Documents/radio-parser/scanner/link.py", line 8, in parse_links
    self.feed(html_content)
  File "/usr/lib/python3.7/html/parser.py", line 111, in feed
    self.goahead(0)
  File "/usr/lib/python3.7/html/parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  File "/usr/lib/python3.7/html/parser.py", line 264, in parse_html_declaration
    return self.parse_marked_section(i)
  File "/usr/lib/python3.7/_markupbase.py", line 160, in parse_marked_section
    if not match:
UnboundLocalError: local variable 'match' referenced before assignment
```

We see here `match` variable is not created if `self.error` is called,
and because error do not raise exception, will create UnboundLocalError :

```
    def parse_marked_section(self, i, report=1):
        rawdata= self.rawdata
        assert rawdata[i:i+3] == '<![', "unexpected call to parse_marked_section()"
        sectName, j = self._scan_name( i+3, i )
        if j < 0:
            return j
        if sectName in {"temp", "cdata", "ignore", "include", "rcdata"}:
            # look for standard ]]> ending
            match= _markedsectionclose.search(rawdata, i+3)
        elif sectName in {"if", "else", "endif"}:
            # look for MS Office ]> ending
            match= _msmarkedsectionclose.search(rawdata, i+3)
        else:
            self.error('unknown status keyword %r in marked section' % rawdata[i+3:j])
        if not match:
            return -1
        if report:
            j = match.start(0)
            self.unknown_decl(rawdata[i+3: j])
        return match.end(0)

```

----------
files: error.txt
messages: 374899
nosy: AbcSxyZ
priority: normal
severity: normal
status: open
title: HTMLParser : HTMLParser.error creating multiple errors.
type: crash
versions: Python 3.7
Added file: https://bugs.python.org/file49370/error.txt

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41489>
_______________________________________


More information about the New-bugs-announce mailing list