[issue32876] HTMLParser raises exception on some inputs
Steven D'Aprano
report at bugs.python.org
Mon Feb 19 18:02:09 EST 2018
Steven D'Aprano <steve+python at pearwood.info> added the comment:
The stdlib HTML parser requires correct HTML.
To parse broken HTML, as you find in the real world, you need a third-party library like BeautifulSoup. BeautifulSoup is much more complex (about 7-8 times as many LOC) but can handle nearly anything a browser can.
I doubt the stdlib will ever compete with BeautifulSoup.
----------
nosy: +steven.daprano
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue32876>
_______________________________________
More information about the Python-bugs-list
mailing list