[Python-Dev] Unicode entities in XML cause problems :-( (original) (raw)
Martin v. Loewis martin@v.loewis.de
28 Apr 2002 01:28:09 +0200
- Previous message: [Python-Dev] Unicode entities in XML cause problems :-(
- Next message: [Python-Dev] Unicode entities in XML cause problems :-(
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Matthias Urlichs" <smurf@noris.de> writes:
>>> import xml.dom.minidom as md >>> d=md.parseString("bߐ")) >>> d.writexml(sys.stdout) ... UnicodeError: ASCII encoding error: ordinal not in range(128) [...] Thus, my proposal (which I'm going to implement since I need it...) is to write such a codec. For simplicity, I propose to accept ü and € and friends, but to emit them as Ӓ (or whatever).
The proper fix, IMO, is to have writexml accept an encoding argument, and, by default, write the output as UTF-8. Then there is no need for character or entity references.
In any case, emitting ü and € in XML is wrong: you cannot use them unless your document type provides them - you should not assume that all XML files use the HTML DTD.
After this codec is written, all occurrences of string.replace('&','&') (and vice versa) within the standard library can be replaced with the appropriate encode/decode methods.
Please see http://python.org/sf/432401. Walter is working on such a codec.
Regards, Martin
- Previous message: [Python-Dev] Unicode entities in XML cause problems :-(
- Next message: [Python-Dev] Unicode entities in XML cause problems :-(
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]