[issue33303] ElementTree Comment text isn't escaped

Stefan Behnel report at bugs.python.org
Sun May 12 03:38:02 EDT 2019


Stefan Behnel <stefan_ml at behnel.de> added the comment:

I'm really sorry again, but I only consulted the XML spec on this now (and also the way libxml2 does it), and I found that XML comment text actually does not get escaped. It's not character data, and, in fact, "--" is not even allowed at all inside of comments. (Funny enough, the HTML serialiser does escaping for both comments and PIs, but, well, that's HTML, I guess…)

https://www.w3.org/TR/REC-xml/#sec-comments

Sorry, Jeffrey, I should have looked that up in the spec much earlier, before you invested so much time into this.

There are two disallowed cases: "--" in the text content, and "-" at the end of the text (which would lead to an "--->"). Now, the thing is, such validation is currently unprecedented in ElementTree, so I don't know if we should start raising exceptions from the serialiser for this case, and if yes, which. Since comments are rare, it won't hurt performance to do that, but once we get started on this, users would probably also want their text and attribute content and their tag and attribute names to be validated, and that would hurt then.

So, I will have to reject the PR and this ticket.

----------
resolution:  -> not a bug
stage: patch review -> resolved
status: open -> closed

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue33303>
_______________________________________


More information about the Python-bugs-list mailing list