[Python-Dev] XML codec? (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Fri Nov 9 19:55:51 CET 2007


So what if the unicode string doesn't start with an XML declaration? Will it add one? No.

Ok. So the XML document would be ill-formed then unless the encoding is UTF-8, right?

The point of this code is not just to return whether the string starts with "<?xml" or not. There are actually three cases:

Still, it's overly complex for that matter:

* The string does start with "<?xml"

if s.startswith("<?xml"): return Yes

* The string starts with a prefix of "<?xml", i.e. we can only decide if it starts with "<?xml" if we have more input.

if "<?xml".startswith(s): return Maybe

* The string definitely doesn't start with "<?xml".

return No

What bit fiddling are you referring to specifically that you think is better done in C than in Python? The code that checks the byte signature, i.e. the first part of detectxmlencodingstr().

I can't see any bit fiddling there, except for the bit mask of candidates. For the candidate list, I cannot quite understand why you need a bit mask at all, since the candidates are rarely overlapping.

I think there could be a much simpler routine to have the same effect.

Regards, Martin



More information about the Python-Dev mailing list