[Python-Dev] Re: gettext in the standard library (original) (raw)

M.-A. Lemburg mal@lemburg.com
Sat, 19 Aug 2000 11:37:28 +0200


"Barry A. Warsaw" wrote:

>>>>> "M" == M <mal@lemburg.com> writes: M> I know that gettext is a standard, but from a technology POV I M> would have implemented this as codec wich can then be plugged M> wherever l10n is needed, since strings have the new .encode() M> method which could just as well be used to convert not only the M> string into a different encoding, but also a different M> language. Anyway, just a thought... That might be cool to play with, but I haven't done anything with Python's Unicode stuff (and painfully little with gettext too) so right now I don't see how they'd fit together. My gut reaction is that gettext could be the lower level interface to string.encode(language).

Oh, codecs are not just about Unicode. Normal string objects also have an .encode() method which can be used for these purposes as well.

M> What I'm missing in your doc-string is a reference as to how M> well gettext works together with Unicode. After all, i18n is M> among other things about international character sets. M> Have you done any experiments in this area ?

No, but I've thought about it, and I don't think the answer is good. The GNU gettext functions take and return char*'s, which probably isn't very compatible with Unicode. gettext therefore takes and returns PyStringObjects.

Martin mentioned the possibility of using UTF-8 for the catalogs and then decoding them into Unicode. That should be a reasonable way of getting .gettext() to talk Unicode :-)

We could do better with the pure-Python implementation, and that might be a good reason to forgo any performance gains or platform-dependent benefits you'd get with gettext. Of course the trick is using the Unicode-unaware tools to build .mo files containing Unicode strings. I don't track GNU gettext developement close enough to know whether they are addressing Unicode issues or not.

Just dreaming a little here: I would prefer that we use some form of XML to write the catalogs. XML comes with Unicode support and tools for writing XML are available too. We'd only need a way to transform XML into catalog files of some Python specific platform independent format (should be possible to create .mo files from XML too).

-- Marc-Andre Lemburg


Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/