[Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 (original) (raw)
Martin v. L�wis martin@v.loewis.de
12 Apr 2003 13:43:28 +0200
- Previous message: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
- Next message: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Barry Warsaw <barry@python.org> writes:
I used standard msgfmt to turn that into a .mo file. Then created a GNUTranslation(fp, coerce=True) and called
>>> t.ugettext(u'ab\xde') u'\xa4yz' This is what I should expect, right? ;)
More or less, yes. Now, what happens if you pot "real" non-ASCII (i.e. bytes above 127) into the message id, like so:
msgid "ab�" msgstr "\xc2\xa4yz"
msgfmt will still accept that, but msgunfmt will complain:
msgunfmt: warning: The following msgid contains non-ASCII characters. This will cause problems to translators who use a character encoding different from yours. Consider using a pure ASCII msgid instead.
If you think about this, this is really bad: If you mean to apply the charset= to both msgid and msgstr, then translators using a different charset from yours are in big trouble.
They are faced with three problems:
- They don't know what the charset of the msgids is. The PO files do have a charset declaration, the POT files typically don't.
- They need to convert the msgids from the POT encoding to their native encoding. There are no tools available to support that readily; tools like iconv might correctly convert the msgids, but won't update the charset= in the POT file (if the charset was filled out).
- By converting the msgids, they are also changing them. That means the msgids are not really suitable as keys anymore.
Regards, Martin
- Previous message: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
- Next message: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]