[Python-Dev] Small issues in gettext support (original) (raw)

Gustavo Niemeyer niemeyer at conectiva.com
Sun Apr 25 19:40:19 EDT 2004


Hello folks,

I've been working with gettext support in Python, and found some issues I'd like to discuss with you.

First, I've noticed that there's a difference in the internal implementation of gettext and GNU gettext regarding the returned encoding on non-unicode strings. Notice the difference in the result of this code:

import gettext import locale locale.setlocale(locale.LC_ALL, "") locale.textdomain("apt-cdrom-registry") gettext.textdomain("apt-cdrom-registry") print locale.gettext("Choose the available CDROMs from the list below") print gettext.gettext("Choose the available CDROMs from the list below")

This has shown the following:

Escolha os CDROMs disponíves na lista abaixo Escolha os CDROMs disponíves na lista abaixo

The reason for this difference is clear: GNU gettext defaults to the current locale when returning encoded strings, while gettext.py returns strings in the encoding used in the .mo file. The fix is simply changing the following code

Encode the Unicode tmsg back to an 8-bit string, if possible

if self._charset: return tmsg.encode(self._charset)

to use the system encoding (sys.getdefaultencoding()) instead of self._charset.

Regarding a similar issue, I've also noticed that we're currently missing bind_textdomain_codeset() support. This function changes the codeset used to return the translated strings.

So, I'd like to implement the following changes:

Comments?

-- Gustavo Niemeyer http://niemeyer.net



More information about the Python-Dev mailing list