[Python-Dev] Security implications of pep 383 (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Tue Mar 29 20:56:41 CEST 2011


Not sure how real the security risk is here:

http://blog.omega-prime.co.uk/?p=107 Basically he is saying that if you store a list of blacklisted files with names encoded in big-5 (or some other non-utf8 compatible encoding) if those names are passed at the command line, or otherwise read in and decoded from an assumed-utf8 source with surrogate escaping, the surrogate escape decoded names will not match the properly decoded blacklisted names.

As described, I find the problem a little bit artificial: supposedly, he was passing the file name on the command line. However, since his terminal is in UTF-8 and the file name in Big5, the console didn't display the file name in a meaningful way when he ran the program. So whoever ran the program ignored the moji-bake, and didn't wonder whether it could have any effect on proper functioning of the program. In addition, if he did ls(1) on the directory, it would have displayed question marks throughout. This should alert the user that something bad is going on.

Notice that this isn't really PEP-383's fault. If the file system encoding was UTF-8, and the blacklist was UTF-8, and the program ran in a Latin-1 locale, it would have decoded the file name nicely (without surrogates), but the blacklist check would still have failed.

He should have opened the file in the locale's encoding (i.e. giving no encoding), using the surrogate escape handler.

Regards, Martin



More information about the Python-Dev mailing list