[Python-Dev] XML codec? (original) (raw)
Walter Dörwald walter at livinglogic.de
Fri Nov 9 13:49:41 CET 2007
- Previous message: [Python-Dev] XML codec?
- Next message: [Python-Dev] XML codec?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Martin v. Löwis wrote:
Because you can force the encoder to use a specified encoding. If you do this and the unicode string starts with an XML declaration So what if the unicode string doesn't start with an XML declaration? Will it add one?
No.
If so, what version number will it use?
If we added this we could add an extra argument version to the encoder constructor defaulting to '1.0'.
OK, so should I put the C code into a xml module? I don't see the need for C code at all. Doing the bit fiddling for Modules/codecsmodule.c::detectxmlencodingstr() in C felt like the right thing to do. Hmm. I don't think a sequence like + if (strlen>0) + { + if (*str++ != '<')_ _+ return 1;_ _+ if (strlen>1) + { + if (*str++ != '?') + return 1; + if (strlen>2) + { + if (*str++ != 'x') + return 1; + if (strlen>3) + { + if (*str++ != 'm') + return 1; + if (strlen>4) + { + if (*str++ != 'l') + return 1; + if (strlen>5) + { + if (*str != ' ' && *str != '\t' && *str != '\r' && *str != '\n') + return 1; is well-maintainable C. I feel it is much better writing if not s.startswith("<=?xml"): return 1
The point of this code is not just to return whether the string starts with "<?xml" or not. There are actually three cases:
- The string does start with "<?xml"
- The string starts with a prefix of "<?xml", i.e. we can only decide if it starts with "<?xml" if we have more input.
- The string definitely doesn't start with "<?xml".
What bit fiddling are you referring to specifically that you think is better done in C than in Python?
The code that checks the byte signature, i.e. the first part of detect_xml_encoding_str().
Servus, Walter
- Previous message: [Python-Dev] XML codec?
- Next message: [Python-Dev] XML codec?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]