[issue35142] html.entities.html5 should require a trailing semicolon

Daniel Lovell report at bugs.python.org
Fri Nov 2 01:03:45 EDT 2018


New submission from Daniel Lovell <lovell.daniel92 at gmail.com>:

html.entities.html5 keys should either require a trailing semicolon. The Python docs say:

html.entities.html5
"A dictionary that maps HTML5 named character references [1] to the equivalent Unicode character(s), e.g. html5['gt;'] == '>'. Note that the trailing semicolon is included in the name (e.g. 'gt;'), however some of the names are accepted by the standard even without the semicolon: in this case the name is present with and without the ';'. See also html.unescape()."

https://docs.python.org/3/library/html.entities.html?highlight=html

However, it is not clear without looking at the source which keys require the semicolon and which do not. Taking a look at the source, the number which require a trailing semicolon vastly outnumber the others.

For simplicity and continuity with the w3.org standard HTML5 Character Entity Reference Chart - I recommend that the trailing semicolon be required. As they are in HTML5: https://dev.w3.org/html5/html-author/charref

My recommendation could then be extrapolated to say we should require the ampersand as HTML5 does, but I don't think this revision should be taken this far unless others agree.

----------
components: Library (Lib)
messages: 329105
nosy: daniellovell
priority: normal
severity: normal
status: open
title: html.entities.html5 should require a trailing semicolon
type: behavior
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35142>
_______________________________________


More information about the Python-bugs-list mailing list