Issue 24197: minidom parses comments wrongly (original) (raw)

from xml.dom import minidom

html = """

"""

minidom.parseString(html)

Result: Traceback (most recent call last): File "minidom.py", line 10, in minidom.parseString(html) File "/usr/lib/python2.7/xml/dom/minidom.py", line 1928, in parseString return expatbuilder.parseString(string) File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in parseString return builder.parseString(string) File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in parseString parser.Parse(string, True) xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 34

Tested versions: 2.7.6, 2.7.3

Reason: -- between obraz and super;

Thanks for your report. Alas, according to the W3C XML 1.0 specification:

"For compatibility, the string " -- " (double-hyphen) MUST NOT occur within comments."

So, it appears minidom (and other XML parsers) are correct in rejecting your example as not well-formed XML.

http://www.w3.org/TR/xml/#sec-comments