[Python-Dev] Unicode entities in XML cause problems :-( (original) (raw)
Matthias Urlichs smurf@noris.de
Sat, 27 Apr 2002 21:30:57 +0200
- Previous message: [Python-Dev] _PyString_Resize
- Next message: [Python-Dev] Unicode entities in XML cause problems :-(
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Playing around with xml.dom.minidom, I noticed that this beast is perfectly able to read HTML which it can't print:
import xml.dom.minidom as md d=md.parseString("bߐ")) d.writexml(sys.stdout) ... UnicodeError: ASCII encoding error: ordinal not in range(128)
Ouch.
Scanning the sources, which revealed various ways to replace '&' with '&' but no generic codec for [ht|x]ml-escaped character entities.
Thus, my proposal (which I'm going to implement since I need it...) is to write such a codec. For simplicity, I propose to accept ü and € and friends, but to emit them as Ӓ (or whatever).
After this codec is written, all occurrences of string.replace('&','&') (and vice versa) within the standard library can be replaced with the appropriate encode/decode methods.
Thoughts? Or am I totally blind, such a codec already exists, and I have missed it?
-- Matthias Urlichs | noris network AG | http://smurf.noris.de/
- Previous message: [Python-Dev] _PyString_Resize
- Next message: [Python-Dev] Unicode entities in XML cause problems :-(
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]