[Python-Dev] PEP 383 update: utf8b is now the error handler (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Wed May 6 09:53:33 CEST 2009
- Previous message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Next message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> > Second, I suggest "surrogate-replace" as the name of the error handler > > rather than "utf8b". > > I think this is bike-shedding.
I don't personally care (I already was aware of UTF-8B), but there are plenty of others who do.
I think it is a fairly bad name, because it is easy to confuse it with the "surrogates" error handler (unless you suggest to rename that also).
You have to fix the existing uses of the obsolete "python-escape", anyway.
Indeed - but only in the PEP. In the implementation, it's already utf8b throughout. Now it is also in the PEP; thanks for pointing that out.
> It's a security risk. If U+DCXX would map to \xXX, then somebody could > embed U+DC2E U+DC2E U+DC2F into a character string; even if this gets > sanitized, nobody would expect that this will actually access ../
The odds that anybody will actually take notice of U+002E U+002E U+002F in a string are sufficiently small that any number of exploits have already been based on it. I agree that there is some additional risk from this if people make the check for "../" before they prepend "\ucd2e\udc2e\udc2f", but I think that risk is very small compared to the pain of having a error handler whose raison d'etre is to not raise exceptions go ahead and raise them anyway.
The problem is that functions like normpath will recognize ../, and that applications rely on them for file name sanitation. If they could be tricked into writing outside of their target folders, this would be a huge security risk.
OTOH, I don't care breaking applications on misconfigured systems. People using SJIS as their locale encodings have bigger problems than Python raising exceptions.
See also my reply to Lino Mastrodomenico.
URL?
But you're writing the PEP, so this battle will have to be deferred. Eventually Python will have to take a stand on Unicode conformance, but it's not urgent yet.
I think it's always applications that are conforming or not, rather than libraries. Libraries should allow to write conforming applications. They may refuse to write certain non-conforming applications (although users then replace the library with one that does allow them to do what they want). Libraries can never enforce that applications conform to some standard.
Sorry! I suggest substituting the paragraph above for the paragraph which begins "The encode error handler interface presentlyrequires..." at line 129.
Ah, ok. This was Glen Linderman's text before - now it's yours :-)
I think I forgot to do this before: "I hereby dedicate all text I suggest for inclusion in the PEP to the public domain."
:-)
Martin
- Previous message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Next message: [Python-Dev] PEP 383 update: utf8b is now the error handler
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]