[Python-Dev] Re: gettext in the standard library (original) (raw)

Martin von Loewis loewis@informatik.hu-berlin.de
Sat, 19 Aug 2000 09:25:20 +0200 (MET DST)


What I'm missing in your doc-string is a reference as to how well gettext works together with Unicode. After all, i18n is among other things about international character sets. Have you done any experiments in this area ?

I have, to some degree. As others pointed out, gettext maps byte arrays to byte arrays. However, in the GNU internationalization project, it is convention to put an entry like

msgid "" msgstr "" "Project-Id-Version: GNU grep 2.4\n" "POT-Creation-Date: 1999-11-13 11:33-0500\n" "PO-Revision-Date: 1999-12-07 10:10+01:00\n" "Last-Translator: Martin von L=F6wis <martin@mira.isdn.cs.tu-berlin.de>\n" "Language-Team: German <de@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=3DISO-8859-1\n" "Content-Transfer-Encoding: 8-bit\n"

into the catalog, which can be accessed as translation of the empty string. It typically has a charset=3D element, which allows to analyse what character set is used in the catalog. Of course, this is a convention only, so it may not be present. If it is absent, and conversion to Unicode is requested, it is probably a good idea to assume UTF-8 (as James indicated, that will be the GNOME coded character set for catalogs, for example).

In any case, I think it is a good idea to support retrieval of translated strings as Unicode objects. I can think of two alternative interfaces:

gettext.gettext(msgid, unicode=3D1) #or gettext.unigettext(msgid)

Of course, if applications install _, they'd know whether they want unicode or byte strings, so _ would still take a single argument.

However, I don't think that this feature must be there at the first checkin; I'd volunteer to work on a patch after Barry has installed his code, and after I got some indication what the interface should be.

Regards, Martin