[Python-Dev] IDLE and non-ASCII characters (original) (raw)

Tim Peters tim.one@home.com
Tue, 15 May 2001 02:28:34 -0400


[Guido]

Postscript: using cut and paste, I can enter "s='��'" in IDLE at the Python prompt, both on Linux and on Windows 98. It prints as '\xe4\xf6' on both systems. What changed?

[Martin]

Perhaps the Tcl version? That sounds like the issue that Marc talked about: Tk behaves differently when text is entered programmatically (and perhaps through cut-n-paste), as compared to text entered through the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on Solaris 8 still gives me the UnicodeError.

I don't know which version of Python Guido used. I tried cut-&-paste of

s='��'

from his email into the distributed 2.1 IDLE under Win98, and got

UnicodeError: ASCII encoding error: ordinal not in range(128)

Tk appears to interfere with using the usual Windows ALT+0nnn method of entering funny characters, so unsure what happens then -- but for me it either works fine or does something insane (moves the cursor to the left margin, brings up an IDLE dialog box, etc).

If I open the system Character Map utility and copy-&-paste using that, I can enter all sorts of stuff without problem:

s = "���������������������������������" s '\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef \xf0\xf1\xf2\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

So not all clipboard entries are created equal.

Another clue: if I paste the s='��' snippet from Guido's email into a file opened with Notepad, then immediately copy it again from the Notepad doc, then paste that into Idle, again no problem:

s='��' s '\xe4\xf6'

Using a clipboard diagnostic tool I don't understand, when I copy from Notepad these data formats are in the system clipboard:

TEXT
LOCALE
OEMTEXT

But when I copy from Guido's email under Outlook 2000, it's

DataObject
Rich Text Format
Rich Text Format Without Objects
RTF as Text
TEXT
UNICODTEXT
Ole Private Data
LOCALE
OEMTEXT

Under Character Map, it's

Rich Text Format
TEXT
LOCALE
OEMTEXT

So perhaps it's not the version of Tk but the source of the data, and that Tk grabs an unfortunate data format (when present) from the clipboard in preference to a fortunate one.

the-clipboard-is-a-complex-beast-ly y'rs - tim