msg57211 - (view) |
Author: Walter Dörwald (doerwalter) *  |
Date: 2007-11-07 17:52 |
The patch adds an XML codec. It implements encoding detection as specified in http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing and supports externally specified encodings for both encoding and decoding. |
|
|
msg57213 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2007-11-07 17:53 |
I think it's good to add this; I don't have time to review though. |
|
|
msg57221 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2007-11-07 19:43 |
Nice codec ! The only nit I have is the name: "xml" isn't intuitive enough. I had to read the code to figure out what the codec actually does. "xml" used a encoding usually refers to having Unicode text converted to ASCII with XML entity escapes for all non-ASCII characters. How about "xml-auto-detect" or something along those lines ?! |
|
|
msg57222 - (view) |
Author: Walter Dörwald (doerwalter) *  |
Date: 2007-11-07 21:42 |
"xml-auto-detect" sounds OK to me, it even makes sense for the encoder, because it normally detects the encoding to use for writing from the XML declaration. We could put "xml-auto-detect" into the alias mapping and keep xml as the module name. But I noticed I have to rewrap a lot of lines, before I check it in. |
|
|
msg57224 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2007-11-07 21:54 |
Leaving the module name as "xml" would remove that name from the namespace of possible encodings. "xml" as encoding name is problematic, as many people regard writing data in XML as "encoding the data in XML". I'd simply not use it at all, not even for a codec that converts between Unicode and ASCII+XML entities. |
|
|
msg57280 - (view) |
Author: Walter Dörwald (doerwalter) *  |
Date: 2007-11-08 21:25 |
OK, I've changed the name of the codec to xml_auto_detect and added support for EBCDIC. |
|
|
msg57281 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2007-11-08 21:37 |
Thanks, Walter ! |
|
|
msg63696 - (view) |
Author: Sean Reifschneider (jafo) *  |
Date: 2008-03-17 17:52 |
Marc-Andre: Is this good to be committed, or does it need to be reviewed further? |
|
|
msg63703 - (view) |
Author: Walter Dörwald (doerwalter) *  |
Date: 2008-03-17 18:14 |
There was resistance in python-dev against this patch (see the thread at http://mail.python.org/pipermail/python-dev/2007-November/075138.html), so this issue should probably closed as rejected. However there was consensus, that a detect_xml_encoding() function might be usefull. |
|
|