[Python-Dev] XML codec? (original) (raw)

Walter Dörwald walter at livinglogic.de
Fri Nov 9 13:49:41 CET 2007


Martin v. Löwis wrote:

Because you can force the encoder to use a specified encoding. If you do this and the unicode string starts with an XML declaration So what if the unicode string doesn't start with an XML declaration? Will it add one?

No.

If so, what version number will it use?

If we added this we could add an extra argument version to the encoder constructor defaulting to '1.0'.

OK, so should I put the C code into a xml module? I don't see the need for C code at all. Doing the bit fiddling for Modules/codecsmodule.c::detectxmlencodingstr() in C felt like the right thing to do. Hmm. I don't think a sequence like + if (strlen>0) + { + if (*str++ != '<')_ _+ return 1;_ _+ if (strlen>1) + { + if (*str++ != '?') + return 1; + if (strlen>2) + { + if (*str++ != 'x') + return 1; + if (strlen>3) + { + if (*str++ != 'm') + return 1; + if (strlen>4) + { + if (*str++ != 'l') + return 1; + if (strlen>5) + { + if (*str != ' ' && *str != '\t' && *str != '\r' && *str != '\n') + return 1; is well-maintainable C. I feel it is much better writing if not s.startswith("<=?xml"): return 1

The point of this code is not just to return whether the string starts with "<?xml" or not. There are actually three cases:

What bit fiddling are you referring to specifically that you think is better done in C than in Python?

The code that checks the byte signature, i.e. the first part of detect_xml_encoding_str().

Servus, Walter



More information about the Python-Dev mailing list