[Python-Dev] Bug #112265: Tkinter seems to treat everything as Latin 1 (original) (raw)

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Sat, 26 Aug 2000 17:31:54 +0200

Previous message: [Python-Dev] Is Python moving too fast? (was Re: Is python commercializationazing? ...)
Next message: [Python-Dev] Bug #112265: Tkinter seems to treat everything as Latin 1
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

summary: Tkinter passes 8-bit strings to Tk without any preprocessing. Tk itself expects UTF-8, but passes bogus UTF-8 data right through... or in other words, Tkinter treats any 8-bit string that doesn't contain valid UTF-8 as an ISO Latin 1 string...

::: maybe Tkinter should raise a UnicodeError instead (just like string comparisions etc). example:

w = Label(text="<cp1250 string>")
UnicodeError: ASCII decoding error: ordinal not in range(128)

this will break existing code, but I think that's better than confusing the hell out of anyone working on a non-Latin-1 platform...

+0 from myself -- there's no way we can get a +1 solution (source encoding) into 2.0 without delaying the release...

::: for some more background, see the bug report below, and my followup.

Summary: Impossible to get Win32 default font encoding in widgets

Details: I did not managed to obtain correct font encoding in widgets on Win32 (NT Workstation, Polish version, default encoding cp1250). All cp1250 Polish characters were displayed incorrectly. I think, all characters that do not belong to Latin-1 will be displayed incorrectly. Regarding Python1.6b1, I checked the Tcl/Tk installation (8.3.2). The pure Tcl/Tk programs DO display characters in cp1250 correctly.

As far as I know, the Tcl interpreter woks with UTF-8 encoded strings. Does Python1.6b1 really know about it?

Follow-Ups:

Date: 2000-Aug-26 08:04 By: effbot

Comment: this is really a "how do I", rather than a bug report ;-)

::: In 1.6 and beyond, Python's default 8-bit encoding is plain ASCII. this encoding is only used when you're using 8-bit strings in "unicode contexts" -- for example, if you compare an 8-bit string to a unicode string, or pass it to a subsystem designed to use unicode strings.

If you pass an 8-bit string containing characters outside the ASCII range to a function expecting a unicode string, the result is undefined (it's usually results in an exception, but some subsystems may have other ideas).

Finally, Tkinter now supports Unicode. In fact, it assumes that all strings passed to it are Unicode. When using 8-bit strings, it's only safe to use plain ASCII.

Tkinter currently doesn't raise exceptions for 8-bit strings with non-ASCII characters, but it probably should. Otherwise, Tk will attempt to parse the string as an UTF-8 string, and if that fails, it assumes ISO-8859-1.

::: Anyway, to write portable code using characters outside the ASCII character set, you should use unicode strings.

in your case, you can use:

s = unicode("", "cp1250")

to get the platform's default encoding, you can do:

import locale language, encoding = locale.getdefaultlocale()

where encoding should be "cp1250" on your box.

::: The reason this work under Tcl/Tk is that Tcl assumes that your source code uses the platform's default encoding, and converts things to Unicode (not necessarily UTF-8) for you under the hood. Python 2.1 will hopefully support explicit source encodings, but 1.6/2.0 doesn't.

For detailed info, follow this link: http://sourceforge.net/bugs/?func=detailbug&bug_id=112265&group_id=5470

Previous message: [Python-Dev] Is Python moving too fast? (was Re: Is python commercializationazing? ...)
Next message: [Python-Dev] Bug #112265: Tkinter seems to treat everything as Latin 1
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]