msg202467 - (view) |
Author: Mike FABIAN (mfabian) |
Date: 2013-11-09 08:02 |
Originally reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1024667 I found that Serbian translations in Latin do not work when the locale name is written as sr_RS.UTF-8@latin (one gets the cyrillic translations instead), but they *do* work when the locale name is written as sr_RS@latin (i.e. omitting the '.UTF-8'): $ LANG='sr_RS.UTF-8' python2 -c 'import gettext; print(gettext.ldgettext("anaconda", "What language would you like to use during the installation process?").decode("UTF-8"))' Који језик бисте желели да користите током процеса инсталације? mfabian@ari:~ $ LANG='sr_RS.UTF-8@latin' python2 -c 'import gettext; print(gettext.ldgettext("anaconda", "What language would you like to use during the installation process?").decode("UTF-8"))' Који језик бисте желели да користите током процеса инсталације? mfabian@ari:~ $ LANG='sr_RS@latin' python2 -c 'import gettext; print(gettext.ldgettext("anaconda", "What language would you like to use during the installation process?").decode("UTF-8"))' Koji jezik biste želeli da koristite tokom procesa instalacije? mfabian@ari:~ $ The “gettext” command line tool does not have this problem: mfabian@ari:~ $ LANG='sr_RS@latin' gettext anaconda "What language would you like to use during the installation process?" Koji jezik biste želeli da koristite tokom procesa instalacije?mfabian@ari:~ $ LANG='sr_RS.UTF-8@latin' gettext anaconda "What language would you like to use during the installation process?" Koji jezik biste želeli da koristite tokom procesa instalacije?mfabian@ari:~ $ LANG='sr_RS.UTF-8' gettext anaconda "What language would you like to use during the installation process?" Који језик бисте желели да користите током процеса инсталације?mfabian@ari:~ $ |
|
|
msg202468 - (view) |
Author: Mike FABIAN (mfabian) |
Date: 2013-11-09 08:05 |
The problem turns out to be caused by a problem in normalizing the locale name, see the output of this test program: mfabian@ari:~ $ cat ~/tmp/mike-test.py #!/usr/bin/python2 import sys import os import locale import encodings import encodings.aliases test_locales = [ 'ja_JP.UTF-8', 'de_DE.SJIS', 'de_DE.foobar', 'sr_RS.UTF-8@latin', 'sr_rs@latin', 'sr@latin', 'sr_yu', 'sr_yu.SJIS@devanagari', 'sr@foobar', 'sR@foObar', 'sR', ] for test_locale in test_locales: print("%(orig)s -> %(norm)s" %{'orig': test_locale, 'norm': locale.normalize(test_locale)} ) mfabian@ari:~ $ python2 ~/tmp/mike-test.py ja_JP.UTF-8 -> ja_JP.UTF-8 de_DE.SJIS -> de_DE.SJIS de_DE.foobar -> de_DE.foobar sr_RS.UTF-8@latin -> sr_RS.utf_8_latin sr_rs@latin -> sr_RS.UTF-8@latin sr@latin -> sr_RS.UTF-8@latin sr_yu -> sr_RS.UTF-8@latin sr_yu.SJIS@devanagari -> sr_RS.sjis_devanagari sr@foobar -> sr@foobar sR@foObar -> sR@foObar sR -> sr_RS.UTF-8 mfabian@ari:~ $ I.e. “sr_RS.UTF-8@latin” is normalized to “sr_RS.utf_8_latin” which is clearly wrong and causes a fallback to sr_RS when using gettext which gives the cyrillic translations. |
|
|
msg202469 - (view) |
Author: Mike FABIAN (mfabian) |
Date: 2013-11-09 08:09 |
A simple fix for that problem could look like this: mfabian@ari:~ $ diff -u /usr/lib64/python2.7/locale.py.orig /usr/lib64/python2.7/locale.py --- /usr/lib64/python2.7/locale.py.orig 2013-11-09 09:08:24.807331535 +0100 +++ /usr/lib64/python2.7/locale.py 2013-11-09 09:08:34.526390646 +0100 @@ -377,7 +377,7 @@ # First lookup: fullname (possibly with encoding) norm_encoding = encoding.replace('-', '') norm_encoding = norm_encoding.replace('_', '') - lookup_name = langname + '.' + encoding + lookup_name = langname + '.' + norm_encoding code = locale_alias.get(lookup_name, None) if code is not None: return code @@ -1457,6 +1457,7 @@ 'sr_cs@latn': 'sr_RS.UTF-8@latin', 'sr_me': 'sr_ME.UTF-8', 'sr_rs': 'sr_RS.UTF-8', + 'sr_rs.utf8@latin': 'sr_RS.UTF-8@latin', 'sr_rs.utf8@latn': 'sr_RS.UTF-8@latin', 'sr_rs@latin': 'sr_RS.UTF-8@latin', 'sr_rs@latn': 'sr_RS.UTF-8@latin', mfabian@ari:~ $ |
|
|
msg202470 - (view) |
Author: Mike FABIAN (mfabian) |
Date: 2013-11-09 08:15 |
in locale.py, the comment above “locale_alias = {” says: # Note that the normalize() function which uses this tables # removes '_' and '-' characters from the encoding part of the # locale name before doing the lookup. This saves a lot of # space in the table. But in normalize(), this is actually not done: # First lookup: fullname (possibly with encoding) norm_encoding = encoding.replace('-', '') norm_encoding = norm_encoding.replace('_', '') lookup_name = langname + '.' + encoding code = locale_alias.get(lookup_name, None) “norm_encoding” holds the locale name with these replacements, but then it is not used in the lookup. The patch in http://bugs.python.org/msg202469 fixes that, using the norm_encoding together with adding the alias + 'sr_rs.utf8@latin': 'sr_RS.UTF-8@latin', makes it work for sr_RS.UTF-8@latin, my test program then outputs: mfabian@ari:~ $ python2 ~/tmp/mike-test.py ja_JP.UTF-8 -> ja_JP.UTF-8 de_DE.SJIS -> de_DE.SJIS de_DE.foobar -> de_DE.foobar sr_RS.UTF-8@latin -> sr_RS.UTF-8@latin sr_rs@latin -> sr_RS.UTF-8@latin sr@latin -> sr_RS.UTF-8@latin sr_yu -> sr_RS.UTF-8@latin sr_yu.SJIS@devanagari -> sr_RS.sjis_devanagari sr@foobar -> sr@foobar sR@foObar -> sR@foObar sR -> sr_RS.UTF-8 mfabian@ari:~ $ But note that the normalization of the “sr_yu.SJIS@devanagari” locale is still weird (of course a “sr_yu.SJIS@devanagari” is quite silly and does not exist anyway, but the code in normalize() does not seem to work as intended. |
|
|
msg202471 - (view) |
Author: Mike FABIAN (mfabian) |
Date: 2013-11-09 08:22 |
I think the patch I attach here is a better fix than the patch in http://bugs.python.org/msg202469 because it makes the normalize() function behave more logical overall, with this patch, my test program prints: mfabian@ari:/local/mfabian/src/cpython (2.7-mike %) $ ./python ~/tmp/mike-test.py ja_JP.UTF-8 -> ja_JP.UTF-8 de_DE.SJIS -> de_DE.SJIS de_DE.foobar -> de_DE.foobar sr_RS.UTF-8@latin -> sr_RS.UTF-8@latin sr_rs@latin -> sr_RS.UTF-8@latin sr@latin -> sr_RS.UTF-8@latin sr_yu -> sr_RS.UTF-8@latin sr_yu.SJIS@devanagari -> sr_RS.SJIS@devanagari sr@foobar -> sr_RS.UTF-8@foobar sR@foObar -> sr_RS.UTF-8@foobar sR -> sr_RS.UTF-8 [18995 refs] mfabian@ari:/local/mfabian/src/cpython (2.7-mike %) $ The patch also contains a small fix for the “ks” and “sd” locales in the locale_alias dictionary, they had the “.UTF-8” in the wrong place: - 'ks_in@devanagari': 'ks_IN@devanagari.UTF-8', + 'ks_in@devanagari': 'ks_IN.UTF-8@devanagari', - 'sd': 'sd_IN@devanagari.UTF-8', + 'sd': 'sd_IN.UTF-8@devanagari', (This error is inherited from the locale.alias file from X.org where the locale_alias dictionary is generated from) |
|
|
msg202472 - (view) |
Author: Mike FABIAN (mfabian) |
Date: 2013-11-09 08:24 |
The patch http://bugs.python.org/file32552/0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch is against the current HEAD of the 2.7 branch, but Python 3.3 has exactly the same problem, the same patch fixes it for python 3.3 as well. |
|
|
msg202473 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-11-09 08:44 |
Seems this is a duplicate of . |
|
|
msg208116 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-01-14 21:38 |
locale.normalize() was fixed in (and new entry for 'sr_RS.UTF-8@latin' is not needed anymore). Devanagari entries were fixed in . In any case thank you Mike for your report and proposed patch. |
|
|