[Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 (original) (raw)

Barry Warsaw barry@python.org
16 Apr 2003 12:52:06 -0400


On Sat, 2003-04-12 at 07:43, Martin v. L�wis wrote:

More or less, yes. Now, what happens if you pot "real" non-ASCII (i.e. bytes above 127) into the message id, like so:

But I don't think you'd ever want to do that. In fact, I think in general you're probably talking about ascii msgids or utf-8 encoded Unicode msgids. I'm not sure what else would make sense.

msgfmt will still accept that, but msgunfmt will complain:

Didn't even know about msgunfmt. :)

msgunfmt: warning: The following msgid contains non-ASCII characters. This will cause problems to translators who use a character encoding different from yours. Consider using a pure ASCII msgid instead.

If you think about this, this is really bad: If you mean to apply the charset= to both msgid and msgstr, then translators using a different charset from yours are in big trouble.

Right, but see above. E.g. if your string literals are all Spanish and you want a Turkish translation, then utf-8 is the only common encoding you could possibly use in a .po file, right?

They are faced with three problems: 1. They don't know what the charset of the msgids is. The PO files do have a charset declaration, the POT files typically don't.

Yep, although it would be easy for the extractor to add a charset=utf-8 to the pot file.

2. They need to convert the msgids from the POT encoding to their native encoding. There are no tools available to support that readily; tools like iconv might correctly convert the msgids, but won't update the charset= in the POT file (if the charset was filled out). 3. By converting the msgids, they are also changing them. That means the msgids are not really suitable as keys anymore.

Is this still a problem for when charset=utf-8?

-Barry