[Python-3000] str/unicode tests: pyexpat.c and read(n) (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Sun Jul 22 18:30:26 CEST 2007
- Previous message: [Python-3000] str/unicode tests: pyexpat.c and read(n)
- Next message: [Python-3000] str/unicode tests: pyexpat.c and read(n)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Guido van Rossum schrieb:
On 7/22/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Sure, normally XML is serialized to bytes, but it is also > serializable to unicode, and that's a useful feature to have (if > implementable).
It's not reasonably implementable; users who have use cases will have to encode as UTF-8 first. Now I'm confused. Are we proposing that all our XML APIs read and write encoded bytes, or are we proposing that they read and write Unicode strings, leaving the encoding/decoding to the I/O stream?
Unicode strings in both cases.
I was not talking about writing at all; pyexpat only does reading (aka parsing). It returns Unicode strings, but processes bytes.
I thought the latter was preferred but now it looks like you're arguing for the former?
The XML parser input stream should be byte-oriented. XML has its own notion of input encoding (expressed in the XML declaration, <?xml...); it's the job of the parser to figure it out. Having the user provide a character-oriented stream to the parser is both inconvenient and error-prone: the application would have to figure out the encoding itself first.
Regards, Martin
- Previous message: [Python-3000] str/unicode tests: pyexpat.c and read(n)
- Next message: [Python-3000] str/unicode tests: pyexpat.c and read(n)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]