[Python-Dev] Unicode entities in XML cause problems :-( (original) (raw)

Michael Gilfix mgilfix@eecs.tufts.edu
Sat, 27 Apr 2002 15:47:58 -0400


I came across this myself before I joined the list. My general rule was to always convert unicode to strings with something like: "%s" %unicode (I don't remember if I avoided str because it also returned unicode) for any internal use. I think the context was I wanted to use a type attribute from an xml tag to instantiate an object whose class I retrieved from a dict. So I had something like:

module = self.record['module'] if not resources.dict.has_key (module): raise RuntimeError, "Attempted to retrieve data from non-existant resource module: %s" %module code = resources.dict[module] obj = apply (code, [ ], self.record['args'])

... where module was a unicode string. This was an example where unicode sorta transparently pissed me off because it behaved just like a string in so many ways but wasn't.

             -- Mike

On Sat, Apr 27 @ 21:30, Matthias Urlichs wrote:

Playing around with xml.dom.minidom, I noticed that this beast is perfectly able to read HTML which it can't print:

>>> import xml.dom.minidom as md >>> d=md.parseString("bߐ")) >>> d.writexml(sys.stdout) ... UnicodeError: ASCII encoding error: ordinal not in range(128) Ouch.

-- Michael Gilfix mgilfix@eecs.tufts.edu

For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html