Issue 1519069: incorrect locale.strcoll() return in Windows (original) (raw)

Created on 2006-07-08 03:04 by pez4brian, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg29095 - (view)	Author: Brian Matherly (pez4brian)	Date: 2006-07-08 03:04
Python 2.4.2 in Windows (English locale): >>> import locale >>> locale.setlocale(locale.LC_ALL,'C') 'C' >>> locale.setlocale(locale.LC_ALL,'') 'English_United States.1252' >>> locale.strcoll("M","m") 1 >>> locale.strcoll("Ma","mz") -1 It appears that when a string has one character, "M" is greater than "m", but when it has more than one string, "M" is equal to "m"
msg29096 - (view)	Author: Brian Matherly (pez4brian)	Date: 2006-07-08 03:08
Logged In: YES user_id=726294 Correction: It appears that when a string has one character, "M" is greater than "m", but when it has more than one character, "M" is equal to "m"
msg29097 - (view)	Author: Brian Matherly (pez4brian)	Date: 2006-07-08 03:35
Logged In: YES user_id=726294 I see the same problem in python 2.4.3
msg29098 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2006-07-14 15:55
Logged In: YES user_id=21627 Why do you think this is a bug? We pass the string as-is to the C library, which passes it nearly as-is to CompareStringW. This function then decides how they collate; in Microsoft's definition of the English_United States locale, these strings do have the order you get. In case you wonder how the order is computed: essentially, the strings are compared case insensitive, without diacritics. If they then compare equal, the diacritics are considered. If this still compares equal, Case weights are considered. If this still compares equal, Special weights are considered. (Note: I obtained this indirectly by looking at the LCMapString documentation, assuming that CompareString uses LCMapString with LCMAP_SORTKEY\|SORT_STRINGSORT).
msg29099 - (view)	Author: Brian Matherly (pez4brian)	Date: 2006-07-15 03:14
Logged In: YES user_id=726294 Thanks for your response. That is simply unacceptable. Who at Microsoft needs to be flogged? More likely, this shows my lack of understanding of strings and locale in general. Your explanation does explain the results I get, but wouldn't you admit that the results seem wrong? By the definition given, the strings "Ma", "mb", "Mc", "md" would actually sort in that order! So the list of sorted strings would have alternating capitalization! However, the list of strings "M", "m", "M", "m" would sort as "M", "M", "m", "m" - no alternating capitalization - as I would expect. Would there happen to be some way to sort the strings using the locale, but also using the case earlier in the computation order? Basically, I want the sort to be case sensitive. Thanks again for your response. If you have any suggestions that might help me achieve what I want, it would be greatly appreciated.
msg29100 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2006-07-15 07:37
Logged In: YES user_id=21627 You should ask these questions in some Win32 programmer newsgroup. I don't know whether this sorting is correct or not, I'm not a native English speaker.
msg29101 - (view)	Author: Brian Matherly (pez4brian)	Date: 2006-07-18 02:52
Logged In: YES user_id=726294 I think you are right - it's probably a Windows issue - if it is an issue at all. I don't claim to be a lingual expert. But I would prefer a case sensitive comparison. So I wrote a function. It looks like this: def strcoll_case_sensitive(string1,string2): """ This function was written because string comparisons in Windows seem to be case insensitive if the string is longer than one character. """ # First, compare the first character diff = locale.strcoll(string1[0],string2[0]) if diff == 0: # If the first character is the same, compare the rest diff = locale.strcoll(string1,string2) return diff Thanks for your help. Feel free to close this bug.

History
Date	User	Action	Args
2022-04-11 14:56:18	admin	set	github: 43636
2006-07-08 03:04:50	pez4brian	create