[Python-Dev] gettext in the standard library (original) (raw)

Barry A. Warsaw bwarsaw@beopen.com
Fri, 18 Aug 2000 17:13:31 -0400 (EDT)


Apologies for duplicates to those of you already on python-dev...

I've been working on merging all the various implementations of Python interfaces to the GNU gettext libraries. I've worked from code contributed by Martin, James, and Peter. I now have something that seems to work fairly well so I thought I'd update you all.

After looking at all the various wizzy and experimental stuff in these implementations, I opted for simplicity, mostly just so I could get my head around what was needed. My goal was to build a fast C wrapper module around the C library, and to provide a pure Python implementation of an identical API for platforms without GNU gettext.

I started with Martin's libintlmodule, renamed it _gettext and cleaned up the C code a bit. This provides gettext(), dgettext(), dcgettext(), textdomain(), and bindtextdomain() functions. The gettext.py module imports these, and if it succeeds, it's done.

If that fails, then there's a bunch of code, mostly derived from Peter's fintl.py module, that reads the binary .mo files and does the look ups itself. Note that Peter's module only supported the GNU gettext binary format, and that's all mine does too. It should be easy to support other binary formats (Solaris?) by overriding one method in one class, and contributions are welcome.

James's stuff looked cool too, what I grokked of it :) but I think those should be exported as higher level features. I didn't include the ability to write .mo files or the exported Catalog objects. I haven't used the I18N services enough to know whether these are useful.

I added one convenience function, gettext.install(). If you call this, it inserts the gettext.gettext() function into the builtins namespace as `_'. You'll often want to do this, based on the I18N translatable strings marking conventions. Note that importing gettext does /not/ install by default!

And since (I think) you'll almost always want to call bindtextdomain() and textdomain(), you can pass the domain and localedir in as arguments to install. Thus, the simple and quick usage pattern is:

import gettext
gettext.install('mydomain', '/my/locale/dir')

print _('this is a localized message')

I think it'll be easier to critique this stuff if I just check it in. Before I do, I still need to write up a test module and hack together docos. In the meantime, here's the module docstring for gettext.py. Talk amongst yourselves. :)

-Barry

"""Internationalization and localization support.

This module provides internationalization (I18N) and localization (L10N) support for your Python programs by providing an interface to the GNU gettext message catalog library.

I18N refers to the operation by which a program is made aware of multiple languages. L10N refers to the adaptation of your program, once internationalized, to the local language and cultural habits. In order to provide multilingual messages for your Python programs, you need to take the following steps:

- prepare your program by specially marking translatable strings
- run a suite of tools over your marked program files to generate raw
  messages catalogs
- create language specific translations of the message catalogs
- use this module so that message strings are properly translated

In order to prepare your program for I18N, you need to look at all the strings in your program. Any string that needs to be translated should be marked by wrapping it in ('...') -- i.e. a call to the function `'. For example:

filename = 'mylog.txt'
message = _('writing a log message')
fp = open(filename, 'w')
fp.write(message)
fp.close()

In this example, the string writing a log message' is marked as a candidate for translation, while the strings mylog.txt' and `w' are not.

The GNU gettext package provides a tool, called xgettext that scans C and C++ source code looking for these specially marked strings. xgettext generates what are called `.pot' files, essentially structured human readable files which contain every marked string in the source code. These .pot files are copied and handed over to translators who write language-specific versions for every supported language.

For I18N Python programs however, xgettext won't work; it doesn't understand the myriad of string types support by Python. The standard Python distribution provides a tool called pygettext that does though (usually in the Tools/i18n directory). This is a command line script that supports a similar interface as xgettext; see its documentation for details. Once you've used pygettext to create your .pot files, you can use the standard GNU gettext tools to generate your machine-readable .mo files, which are what's used by this module and the GNU gettext libraries.

In the simple case, to use this module then, you need only add the following bit of code to the main driver file of your application:

import gettext
gettext.install()

This sets everything up so that your ('...') function calls Just Work. In other words, it installs `' in the builtins namespace for convenience. You can skip this step and do it manually by the equivalent code:

import gettext
import __builtin__
__builtin__['_'] = gettext.gettext

Once you've done this, you probably want to call bindtextdomain() and textdomain() to get the domain set up properly. Again, for convenience, you can pass the domain and localedir to install to set everything up in one fell swoop:

import gettext
gettext.install('mydomain', '/my/locale/dir')

"""