[issue17410] Generator-based HTMLParser

Wed Mar 13 20:24:32 CET 2013

flying sheep added the comment:

no, i didn’t change anything that didn’t have to be changed to expose the tokens. i kept the changes as minimal as possible.

and the tests pass! i attached the patch.

---

aside thoughts:

i had to change _markupbase.py, too, but i wonder why it’s even a separate module: it is only ever imported by html.parser and its only content, ParserBase, is only subclassed once (by HTMLParser). both classes are so intertwined and dependent on each other (ParserBase calls HTMLParser methods that it itself doesn’t even define) that i think _markupbase should just be scrapped and included into HTMLParser.

----------
keywords: +patch
Added file: http://bugs.python.org/file29401/htmltokenizer.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17410>
_______________________________________