Issue 33303: ElementTree Comment text isn't escaped (original) (raw)
Hmm, sorry, I was wrong here. I looked it up and also checked the behaviour of other libraries: the data content of PIs is application specific and must not be escaped, at all. It's not XML character data.
Sorry for the confusion and the extra work on your side.
I'm really sorry again, but I only consulted the XML spec on this now (and also the way libxml2 does it), and I found that XML comment text actually does not get escaped. It's not character data, and, in fact, "--" is not even allowed at all inside of comments. (Funny enough, the HTML serialiser does escaping for both comments and PIs, but, well, that's HTML, I guess…)
https://www.w3.org/TR/REC-xml/#sec-comments
Sorry, Jeffrey, I should have looked that up in the spec much earlier, before you invested so much time into this.
There are two disallowed cases: "--" in the text content, and "-" at the end of the text (which would lead to an "--->"). Now, the thing is, such validation is currently unprecedented in ElementTree, so I don't know if we should start raising exceptions from the serialiser for this case, and if yes, which. Since comments are rare, it won't hurt performance to do that, but once we get started on this, users would probably also want their text and attribute content and their tag and attribute names to be validated, and that would hurt then.
So, I will have to reject the PR and this ticket.