[issue42821] HTMLParser: subsequent duplicate attributes should be ignored

Ezio Melotti report at bugs.python.org
Wed Jan 6 01:46:27 EST 2021


Ezio Melotti <ezio.melotti at gmail.com> added the comment:

If we follow the behavior of the browser, we will have to pick one of the two values and discard the other, making this value unaccessible.  If we provide both, scripts and libraries that use HTMLParser will have access to both and can decide what to do.

For example BeautifulSoup already does the right thing:
>>> bs4.BeautifulSoup('<!doctype html><div class="bar" class="foo">text</div>')
<!DOCTYPE html>
<html><body><div class="bar">text</div></body></html>

Changing this might also break code that rely on this behavior.  I'm therefore going to close this as "not a bug".

----------
assignee:  -> ezio.melotti
nosy: +ezio.melotti
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed
type:  -> behavior

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42821>
_______________________________________


More information about the Python-bugs-list mailing list