[Python-Dev] Unicode entities in XML cause problems :-( (original) (raw)
Matthias Urlichs smurf@noris.de
Sun, 28 Apr 2002 06:16:10 +0200
- Previous message: [Python-Dev] Unicode entities in XML cause problems :-(
- Next message: [Python-Dev] Unicode entities in XML cause problems :-(
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
Martin v. Loewis:
The proper fix, IMO, is to have writexml accept an encoding argument, and, by default, write the output as UTF-8. Then there is no need for character or entity references. The encoding should probably default to the one from the document header (UTF-8 if that isn't given).
In any case, emitting ü and € in XML is wrong: you cannot use them unless your document type provides them - you should not assume that all XML files use the HTML DTD. Good point. On the other hand, I didn't plan to do that anyway. ;-) (Are Ӓ and friends OK with any DTD?)
Please see http://python.org/sf/432401. Walter is working on such a codec. Thank you.
For XML escaping, the approach suggested by this patch would be to use xmlcharrefreplace() (see the test script) as the error handler. But that doesn't help with &<>". Personally, I rather dislike having to do a separate replace() for these.
One approach would be to use character maps which have strategic holes where & < > and possibly " live..?
-- Matthias Urlichs | noris network AG | http://smurf.noris.de/
- Previous message: [Python-Dev] Unicode entities in XML cause problems :-(
- Next message: [Python-Dev] Unicode entities in XML cause problems :-(
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]