[Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 (original) (raw)

Martin v. L�wis martin@v.loewis.de
12 Apr 2003 13:43:28 +0200


Barry Warsaw <barry@python.org> writes:

I used standard msgfmt to turn that into a .mo file. Then created a GNUTranslation(fp, coerce=True) and called

>>> t.ugettext(u'ab\xde') u'\xa4yz' This is what I should expect, right? ;)

More or less, yes. Now, what happens if you pot "real" non-ASCII (i.e. bytes above 127) into the message id, like so:

msgid "ab�" msgstr "\xc2\xa4yz"

msgfmt will still accept that, but msgunfmt will complain:

msgunfmt: warning: The following msgid contains non-ASCII characters. This will cause problems to translators who use a character encoding different from yours. Consider using a pure ASCII msgid instead.

If you think about this, this is really bad: If you mean to apply the charset= to both msgid and msgstr, then translators using a different charset from yours are in big trouble.

They are faced with three problems:

  1. They don't know what the charset of the msgids is. The PO files do have a charset declaration, the POT files typically don't.
  2. They need to convert the msgids from the POT encoding to their native encoding. There are no tools available to support that readily; tools like iconv might correctly convert the msgids, but won't update the charset= in the POT file (if the charset was filled out).
  3. By converting the msgids, they are also changing them. That means the msgids are not really suitable as keys anymore.

Regards, Martin