[issue41748] HTMLParser: parsing error

karl report at bugs.python.org
Sun Jan 3 03:22:14 EST 2021


karl <karl+pythonbugs at la-grange.net> added the comment:

Ezio,

TL,DR: Testing in browsers and adding two tests for this issue. 
       Should I create a PR just for the tests?

https://github.com/python/cpython/blame/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/test/test_htmlparser.py#L479-L485


A: comma without spaces
-----------------------


Tests for browsers:
data:text/html,<!doctype html><div class=bar,baz=asd>text</div>

Serializations:
* Firefox, Gecko (86.0a1 (2020-12-28) (64-bit)) 
* Edge, Blink (Version 89.0.752.0 (Version officielle) Canary (64 bits))
* Safari, WebKit (Release 117 (Safari 14.1, WebKit 16611.1.7.2))

Same serialization in these 3 rendering engines
<div class="bar,baz=asd">text</div>


Adding:

    def test_comma_between_unquoted_attributes(self):
        # bpo 41748
        self._run_check('<div class=bar,baz=asd>',
                        [('starttag', 'div', [('class', 'bar,baz=asd')])])


❯ ./python.exe -m test -v test_htmlparser

…
test_comma_between_unquoted_attributes (test.test_htmlparser.HTMLParserTestCase) ... ok
…

Ran 47 tests in 0.168s

OK

== Tests result: SUCCESS ==

1 test OK.

Total duration: 369 ms
Tests result: SUCCESS


So this is working as expected for the first test.


B: comma with spaces
--------------------

Tests for browsers:
data:text/html,<!doctype html><div class=bar, baz=asd>text</div>

Serializations:
* Firefox, Gecko (86.0a1 (2020-12-28) (64-bit)) 
* Edge, Blink (Version 89.0.752.0 (Version officielle) Canary (64 bits))
* Safari, WebKit (Release 117 (Safari 14.1, WebKit 16611.1.7.2))

Same serialization in these 3 rendering engines
<div class="bar" ,baz="asd">text</div>


Adding
    def test_comma_with_space_between_unquoted_attributes(self):
        # bpo 41748
        self._run_check('<div class=bar ,baz=asd>',
                        [('starttag', 'div', [
                            ('class', 'bar'),
                            (',baz', 'asd')])])


❯ ./python.exe -m test -v test_htmlparser


This is failing.

======================================================================
FAIL: test_comma_with_space_between_unquoted_attributes (test.test_htmlparser.HTMLParserTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/karl/code/cpython/Lib/test/test_htmlparser.py", line 493, in test_comma_with_space_between_unquoted_attributes
    self._run_check('<div class=bar ,baz=asd>',
  File "/Users/karl/code/cpython/Lib/test/test_htmlparser.py", line 95, in _run_check
    self.fail("received events did not match expected events" +
AssertionError: received events did not match expected events
Source:
'<div class=bar ,baz=asd>'
Expected:
[('starttag', 'div', [('class', 'bar'), (',baz', 'asd')])]
Received:
[('data', '<div class=bar ,baz=asd>')]

----------------------------------------------------------------------


I started to look into the code of parser.py which I'm not familiar (yet) with.

https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/html/parser.py#L42-L52

Do you have a suggestion to fix it?

----------
nosy: +karlcow

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41748>
_______________________________________


More information about the Python-bugs-list mailing list