[Python-Dev] Security implications of pep 383 (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Tue Mar 29 23:17:32 CEST 2011


'\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL LETTER O WITH DIAERESIS}'

I guess the filesystem shouldn't treat these as the same (even though they are), but what if some webservice does? I suspect you should normalize both strings before comparing them in any blacklist, and what happens with surrogates when you normalize?

I think the whole blacklist example is artificial. The string in the blacklist is actually a Chinese "hello" greeting, so it surely isn't the string being blacklisted. For proper blacklisting, you would likely use substring searches, case-insensitivity, transliterations, and perhaps even regular expressions and word stemming. If you consider all these things, proper or alternative encodings of the same text are just another issue to consider.

Regards, Martin



More information about the Python-Dev mailing list