Issue 1028: Tkinter binding involving Control-spacebar raises unicode error (original) (raw)
Created on 2007-08-26 20:48 by kbk, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (45)
Author: Kurt B. Kaiser (kbk) *
Date: 2007-08-26 20:48
The control-spacebar binding is used in IDLE to force open the completions window. It's causing IDLE to exit with a utf8 decode error. Attached is a Tkinter cut-down exhibiting the problem and a patch.
The cutdown runs ok on 2.6 but not on py3k because the latter uses PyUnicode_FromString on all the arguments and errs out when it encounters a character outside the utf-8 range.
Strangely, on my system, control-spacebar is sending a two byte string, C0E8 via the %A parameter. Control-2 does the same. Other keys with combinations of modifier keys send one byte.
Linux trader 2.6.18-ARCH #1 SMP PREEMPT Sun Nov 19 09:14:35 CET 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
Can the problem be confirmed?
Using PyUnicode_FromUnicode on %A works because the unicode string is copied instead of decoded, and that parameter is supposed to be unicode, in any case.
The patch fixes the problem on my system but should be reviewed, especially whether the cast in the call to PyUnicode_FromUnicode is suitably cross- platform.
Assigning to Neal since he's working a lot of Unicode issues right now. I can check it in if I get approval.
Author: Kurt B. Kaiser (kbk) *
Date: 2007-08-26 20:53
Heh, I see we have the same damn problem SF had: when a comment is edited, it doesn't re-wrap properly when submitted. You have to remove the returns manually after editing.
Author: Kurt B. Kaiser (kbk) *
Date: 2007-08-26 20:54
Nope, you have to make sure not to type too wide.
Author: Kurt B. Kaiser (kbk) *
Date: 2007-08-26 23:18
Well, maybe someday Tk will send a multibyte unicode character. Update the patch.
Author: Neal Norwitz (nnorwitz) *
Date: 2007-08-26 23:40
I can confirm the problem and that your patch fixes the problem. Go ahead and check it in. Thanks!
Author: Kurt B. Kaiser (kbk) *
Date: 2007-08-27 01:57
OK, thanks for the review! I suppose Tk is sending a bad string. r57540
Author: Hirokazu Yamamoto (ocean-city) *
Date: 2008-11-18 04:38
Sorry, I reverted r57540 because it caused segfault at IDLE exit. (See ) I reopened this issue.
Author: Guilherme Polo (gpolo) *
Date: 2008-11-19 00:34
I can reproduce it here with tk8.4, using tk8.5 doesn't cause this.
Author: Guilherme Polo (gpolo) *
Date: 2008-11-19 01:15
Here is a patch that doesn't use magic numbers :P I didn't hit the problem described in with this one, and PythonCmd should be doing this anyway, but ideally we should move to Tcl_CreateObjCommand.
Author: Guilherme Polo (gpolo) *
Date: 2008-11-19 01:19
Removed some repeated code in the patch
Author: Hirokazu Yamamoto (ocean-city) *
Date: 2008-11-19 06:01
I confirmed PythonCmd_check_for_utf.diff worked on my machine. IDLE didn't crash.
I can reproduce it here with tk8.4, using tk8.5 doesn't cause this.
That is, this is a bug of tk8.4, and solved in tk8.5 which is already stable release? If so, I feel python don't have to workaround this bug.
I'm a little worry about performance because Tcl_NumUtfChars() will be
called for every command string.
By the way, I cannot reproduce this bug with tk8.4.12(on windows). What is your tk version? Maybe older than that?
Author: Guilherme Polo (gpolo) *
Date: 2008-11-19 10:27
tk 8.4.19 here, but windows and linux almost surely uses different window managers (you could run gnome and others under windows, but I'm betting it is not the case).
Now, it is very hard to say that we shouldn't care about this bug here. Tcl has it documented that its string arguments to Tcl_CmdProc are encoded in normalized utf-8 since tcl 8.1 which was released almost 10 years ago. I guess we are just luck that this was the first time the bug was noticed.
It also says that Tcl_CreateCommand shouldn't be used anymore, instead Tcl_CreateObjCommand should be used like I said in the previous comment.
Author: Hirokazu Yamamoto (ocean-city) *
Date: 2008-11-21 08:19
I suceeded to reproduce this issue with coLinux + UltraVNC on Win2000.
Yes, py3k claimed utf-8 error, so I tried trunk. Here is result.
*** event.keycode: 8 *** event.state: 0 *** event.char: ''
*** event.keycode: 16 *** event.state: 4 *** event.char: '\xc0\x80'
This '\xc0\x80' seems to be used in tcl as null byte '\0'. You can see this magic value in tcl source and google.
I think we should convert this to '\x00' at python side. (shouldn't treat this as utf-16)
I can see py3k + adhok.patch can output this result.
*** event.keycode: 8 *** event.state: 0 *** event.char: ''
*** event.keycode: 16 *** event.state: 4 *** event.char: '\x00'
Probably Tcl_GetUnicode does this conversion inside. (I'm not sure, because I didn't look into source code so deeply) And I'm not sure why this error doesn't happen with tk8.5.
Author: Hirokazu Yamamoto (ocean-city) *
Date: 2008-11-21 08:21
I did little modification to tkintertest.py. Please use this line.
my_print("*** event.char: ", repr(event.char))
Author: Guilherme Polo (gpolo) *
Date: 2008-11-21 10:36
You are missing the point on using Tcl_CreateObjCommand, I didn't mean to just go and and do s/Tcl_CreateCommand/Tcl_CreateObjCommand/ because if you are going to convert everything to unicode then there is no point in using Tcl_CreateObjCommand. Also, Tcl_ObjCmdProc should use Tcl_Obj *CONST objv[] instead of Tcl_Obj *const objv[] because Tcl may define CONST as nothing, and it uses CONST when defining Tcl_ObjCmdProc.
Author: Guilherme Polo (gpolo) *
Date: 2008-11-21 13:37
I'm sorry if it sounded like I were bashing you, I was just pointing out my view of the patch -- you didn't need to remove it. The patch I submitted here can also be improved (although it "works"), but I'm leaving it as a possible idea for someone else that might look into this, since I can't invest much time into this right now.
Author: Hirokazu Yamamoto (ocean-city) *
Date: 2008-11-21 13:40
You are missing the point on using Tcl_CreateObjCommand, I didn't mean to just go and and do s/Tcl_CreateCommand/Tcl_CreateObjCommand/ because if you are going to convert everything to unicode then there is no point in using Tcl_CreateObjCommand.
I'm not tcl/tk expert, so probably missng many things. :-( Can you explain how to solve this issue by moving to Tcl_CreateObjCommand?
Also, Tcl_ObjCmdProc should use Tcl_Obj *CONST objv[] instead of Tcl_Obj *const objv[] because Tcl may define CONST as nothing, and it uses CONST when defining Tcl_ObjCmdProc.
I created adhok.patch just for explanation. This is not solution. I used Tcl_CreateObjCommand + Tcl_GetUnicode to demonstrate Tcl converts '\xc0\x80' to null byte. (adhok.patch contained Japanese characters, so I'll repost that as just_for_explanation.patch)
Author: Guilherme Polo (gpolo) *
Date: 2008-11-21 14:03
Hirokazu Yamamoto added the comment:
You are missing the point on using Tcl_CreateObjCommand, I didn't mean to just go and and do s/Tcl_CreateCommand/Tcl_CreateObjCommand/ because if you are going to convert everything to unicode then there is no point in using Tcl_CreateObjCommand.
I'm not tcl/tk expert, so probably missng many things. :-( Can you explain how to solve this issue by moving to Tcl_CreateObjCommand?
By moving to Tcl_CreateObjCommand we would start using the FromObj function present in _tkinter.c that is responsible for converting tcl objects to python objects. Then what remains to be verified is how compatible this would be with current tkinter code, and checking how correct FromObj is nowadays.
Author: Guilherme Polo (gpolo) *
Date: 2008-12-01 17:45
Some more clarifications about this bug:
Tcl shouldn't be giving us a UTF-8 string with a 0xC0 byte, since that is not valid UTF-8. I'm aware that Tcl uses the sequence 0xC0 0x80 for special purposes but it is also said that such sequences shouldn't be passed as is when exported.
This bug doesn't affect python 2.x because it uses PyString_FromString to convert such value to a Python string, where python 3.x uses PyUnicode_FromString which assumes that it is receiving a valid utf-8 string but it turns out that is not always the case here.
It is indeed related to tk 8.4, but not sure which ones exactly (I hit it with tk 8.4.19).
Author: Guilherme Polo (gpolo) *
Date: 2008-12-03 21:48
I've been working on a new _tkinter (named it as "plumage") these days and I hit this same problem for trusting too much that nothing from tcl, including tk and extensions, would give me this embedded null.
Checking another bridge to Tcl (one done for Perl) it is possible to notice that it also chose to verify for these bytes and convert them to something else, a 0. The code for this for Python can be found at http://code.google.com/p/plumage/source/browse/trunk/src/utils.c#42 up to line 76, it could/should be adapted to the _tkinter in py3k and also for python 2.x.
Author: (gumpy)
Date: 2008-12-07 03:07
This problem exists for me on Ubuntu8.04 with both tk/tcl8.4.16 and 8.5.
Author: Guilherme Polo (gpolo) *
Date: 2009-02-03 14:07
Can you tell what:
print(tkinter.Tcl().tk.call('info', 'patchlevel'))
prints ? Specifically to know which tk 8.5.x has the problem.
Author: (gumpy)
Date: 2009-02-14 04:06
8.5.0 This is still an issue with both tk versions in the 3.0.1 python release.
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-07-18 09:34
More users reported this problem in #6144 and #6512.
Author: Winfried Plappert (wplappert)
Date: 2009-07-19 07:19
I have the problem described in and here is some information
Python version - hand compiled on Ubuntu 9.04: Python 3.1 (r31:73572, Jul 18 2009, 11:13:40) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import tkinter print(tkinter.Tcl().tk.call('info', 'patchlevel')) 8.5.6
The previous Tcl/Tk version I had initially installed was a 8.4.x version - which fails on Control-2 error described in detail in .
- I decided to upgrade Tcl/Tk to version 8.5.
- So I made a "make distclean"
- copied the contents of /usr/include/tcl85 one level higher, so Python could access the necessary tk.h and tcl.h files
- cd to my Python 3.1 source
- ./configure
- make
- sudo make install
and tested again and my test program is now happily responding to a Control-2 keystroke.
Author: Guilherme Polo (gpolo) *
Date: 2009-08-07 14:41
Attaching a patch against trunk, I believe this solves the problems described here.
Author: Guilherme Polo (gpolo) *
Date: 2009-08-07 17:35
Uhm, in the long run I believe it will be better to move to Tcl_CreateObjCommand since it is said that commands created by it are significantly faster than the ones created by Tcl_CreateCommand (more information about this can be found at tcl documentation).
I'm only writing this because, as other places that deal with tcl objects, more care must be taken. For instance, I have applied the .diff on the tk_and_idle_maintenance branch and found two problems that are now patched by adjusts1.diff. It is very likely that there are other bugs around, I'll be trying some tkinter applications to try to find some of them but help is very much needed. Note that there are some tkinter tests on this tk_and_idle_maintenance and they all pass, but they do not fully cover tkinter at this moment so improving them would be good too.
Author: Guilherme Polo (gpolo) *
Date: 2009-08-08 15:43
Today I noticed the StringObj manpage (from tcl) says that the bytes that represent an tcl object should be treated as read-only (although it uses char *) so this .diff may very well cause a segfault at some point.
I'm attaching a new patch that fixes this and also uses Tcl_GetStringFromObj, instead of directly accessing the bytes member of a tcl object, so we know its string representation is not invalid.
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *
Date: 2009-09-16 11:40
Isn't this better implemented via a codec error handler?
Author: Bernt Røskar Brenna (Bernt.Røskar.Brenna) *
Date: 2010-02-16 21:57
I can confirm that it works on Windows (using 3.1.1).
Author: Matthias Klose (doko) *
Date: 2010-03-22 14:01
exists with 3.1.2 and tcl 8.5.8
Author: Ezio Melotti (ezio.melotti) *
Date: 2010-07-12 11:04
This has been reported in #9231 (and #6144, #6512, #7884, #6920, #6424, #5156) too.
Author: Mark Lawrence (BreamoreBoy) *
Date: 2010-09-18 13:10
Could someone with commit privileges please review the patch with a view to committing, thanks.
Author: Tangaroa (jsprunck)
Date: 2010-10-12 03:28
Python 3.1.2, Ubuntu (Lucid)
Caused by Control + Shift + Spacebar
Debugger output from terminal:
Traceback (most recent call last): File "/usr/bin/idle-python3.1", line 5, in main() File "/usr/lib/python3.1/idlelib/PyShell.py", line 1420, in main root.mainloop() File "/usr/lib/python3.1/tkinter/init.py", line 1012, in mainloop self.tk.mainloop(n) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: illegal encoding
Will try the patch.
Author: Michael Strein (mgstrein)
Date: 2011-02-12 07:55
Do we know the status of this issue? Have not seen update in four months. Currently is a major headache on my linux box.
Author: R. David Murray (r.david.murray) *
Date: 2011-04-11 01:27
Nudge: report on the Ubuntu bug tracker that this is still an issue with 3.2:
https://bugs.launchpad.net/bugs/517552
Author: Kurt B. Kaiser (kbk) *
Date: 2011-05-10 14:45
Tcl/Tk uses modified utf-8 internally. This includes using 0xC080, a multibyte Unicode null character, for embedded nulls that work with C's null terminated strings. Java does the same.
Note that typing Ctrl-space and Ctrl-2 are conventional ways to enter a null from the keyboard. That's the reason a null char is associated with those key combinations.
When Tcl exports Unicode, it is supposed to be strict utf-8. Until Tcl8.5, the %A (Unicode character corresponding to an event) was incorrectly leaking the modified Unicode null.
_tkinter.c.2.patch is narrowly focused: if PythonCmd raises a UnicodeDecodeError and if the string passed in an arg is 0xC080, it is replaced with the Unicode null 0x00.
Author: Roundup Robot (python-dev)
Date: 2011-05-11 18:19
New changeset 82cfbe2ddfbb by Kurt B. Kaiser in branch '3.1': Issue #1028: Tk returns invalid Unicode null in %A: UnicodeDecodeError. http://hg.python.org/cpython/rev/82cfbe2ddfbb
Author: STINNER Victor (vstinner) *
Date: 2011-05-11 19:00
I'm working on #2857 which adds the "Modified UTF-8" ("utf-8-java"?) codec to Python. We can maybe use it instead of raising an error in 3.3?
Author: Kurt B. Kaiser (kbk) *
Date: 2011-05-11 21:20
r70039 3.1 forward ported > 3.2 > default. Will be in 3.2.1.
Author: Kurt B. Kaiser (kbk) *
Date: 2011-05-11 21:48
Having a modified utf-8 codec will be useful. That said, it is an error for Tcl/Tk to expose modified utf-8 externally, and that was fixed at some point in Tk8.5. Since Tk is no longer sending 0xC080 for the %A char, switching codecs in _tkinter.c won't accomplish anything.
This fix was to correct a long-standing problem in IDLE using Tk8.4, which is most easily solved by catching the leaked invalid null in _tkinter.c.
It seems to me that, once you switch to modified utf-8 and allow the embedded nulls, you have to make sure everything you are doing uses the modified utf-8 encoding/decoding.
Author: Mike Perry (Mike.Perry)
Date: 2012-02-13 05:55
Hello,
I am still able to reproduce this issue with Python 3.2.2. It seems as if this bug was closed with a the note:
r70039 3.1 forward ported > 3.2 > default. Will be in 3.2.1.
This leads me to believe that either 3.2.2 has a regression or the patch never made it into 3.2.1.
Can anyone chime in with some more details?
Thanks,
Mike
Author: Mike Perry (Mike.Perry)
Date: 2012-02-13 07:46
Figured I should capture the exception. See below.
3.2.2+ (default, Jan 8 2012, 07:22:26) [GCC 4.6.2]
Traceback (most recent call last): File "/usr/bin/idle3", line 5, in main() File "/usr/lib/python3.2/idlelib/PyShell.py", line 1429, in main root.mainloop() File "/usr/lib/python3.2/tkinter/init.py", line 1012, in mainloop self.tk.mainloop(n) UnicodeDecodeError: 'utf8' codec can't decode byte 0xc0 in position 0: invalid start byte
Author: Terry J. Reedy (terry.reedy) *
Date: 2012-05-27 20:10
On Win 7, a brings up the box on all the latest releases: 2.7.3, 3.2.3, and 3.3.0a3. (These all come with recent tk 8.5.x.)
Mike, please retest with 3.2.3 and specify os and tk version and exactly what you entered if there is still a problem.
Author: Mike Perry (Mike.Perry)
Date: 2012-06-05 02:30
Looking good in 3.2.3! Tested on Debian Wheezy using packages python3-tk 3.2.3-1 and idle3 3.2.3~rc1-2.
/* * Mike Perry * mike@cogs....com */
On Sun, May 27, 2012 at 4:10 PM, Terry J. Reedy <report@bugs.python.org> wrote:
Terry J. Reedy <tjreedy@udel.edu> added the comment:
On Win 7, a brings up the box on all the latest releases: 2.7.3, 3.2.3, and 3.3.0a3. (These all come with recent tk 8.5.x.)
Mike, please retest with 3.2.3 and specify os and tk version and exactly what you entered if there is still a problem.
nosy: +terry.reedy versions: -Python 3.1
Python tracker <report@bugs.python.org> <http://bugs.python.org/issue1028>
History
Date
User
Action
Args
2022-04-11 14:56:26
admin
set
github: 45369
2012-06-05 02:30:35
Mike.Perry
set
messages: +
2012-05-27 20:10:31
terry.reedy
set
nosy: + terry.reedy
messages: +
versions: - Python 3.1
2012-02-13 07:46:45
Mike.Perry
set
messages: +
2012-02-13 05:55:42
Mike.Perry
set
nosy: + Mike.Perry
messages: +
2011-08-04 01:58:31
r.david.murray
link
2011-05-11 21:48:14
kbk
set
messages: +
2011-05-11 21:20:52
kbk
set
status: open -> closed
resolution: accepted -> fixed
messages: +
stage: patch review -> resolved
2011-05-11 19:00:25
vstinner
set
nosy: + vstinner
messages: +
2011-05-11 18:19:43
python-dev
set
nosy: + python-dev
messages: +
2011-05-10 14:45:05
kbk
set
files: + _tkinter.c.2.patch
assignee: ned.deily -> kbk
components: + Unicode
nosy: + kbk
messages: +
resolution: accepted
2011-04-11 01:27:25
r.david.murray
set
nosy: + r.david.murray
messages: +
versions: + Python 3.2, Python 3.3
2011-02-12 07:57:39
georg.brandl
set
nosy: + ned.deily
2011-02-12 07:55:55
mgstrein
set
nosy: + mgstrein
messages: +
2010-10-28 08:21:15
ned.deily
link
2010-10-12 03:28:57
jsprunck
set
nosy: + jsprunck, - nnorwitz, terry.reedy, doko, kbk, amaury.forgeotdarc, ocean-city, gpolo, ezio.melotti, wplappert
messages: +
versions: - Python 2.7, Python 3.2
2010-09-18 13:10:56
BreamoreBoy
set
nosy: + terry.reedy, BreamoreBoy
messages: +
versions: - Python 2.6
2010-07-13 02:11:34
kbk
set
priority: normal -> high
2010-07-12 11:04:51
ezio.melotti
set
versions: + Python 3.2, - Python 3.0
nosy:nnorwitz, doko, kbk, amaury.forgeotdarc, ocean-city, gpolo, ezio.melotti, wplappert, Bernt.Røskar.Brenna
messages: +
components: + IDLE
keywords: + needs review
stage: patch review
2010-03-22 14:01:33
doko
set
nosy: + doko
messages: +
2010-02-20 14:22:07
gumpy
set
nosy: - gumpy
2010-02-16 21:57:48
Bernt.Røskar.Brenna
set
nosy: + Bernt.Røskar.Brenna
messages: +
2009-09-16 11:40:08
amaury.forgeotdarc
set
nosy: + amaury.forgeotdarc
messages: +
2009-08-08 15:43:02
gpolo
set
files: + issue1028_2.diff
messages: +
2009-08-07 17:35:08
gpolo
set
files: + adjusts1.diff
messages: +
2009-08-07 14:41:17
gpolo
set
files: + issue1028.diff
messages: +
versions: + Python 2.6, Python 2.7
2009-07-19 07:19:40
wplappert
set
nosy: + wplappert
messages: +
versions: + Python 3.1
2009-07-18 09:39:51
ezio.melotti
link
2009-07-18 09:39:35
ezio.melotti
link
2009-07-18 09:38:41
ezio.melotti
set
superseder: [IDLE] UnicodeDecodeError when invoking force-open-completions ->
2009-07-18 09:36:32
ezio.melotti
set
superseder: [IDLE] UnicodeDecodeError when invoking force-open-completions
2009-07-18 09:34:55
ezio.melotti
set
priority: normal
nosy: + ezio.melotti
messages: +
type: behavior
2009-02-14 04:06:27
gumpy
set
messages: +
2009-02-03 14:07:36
gpolo
set
messages: +
2008-12-07 03:07:25
gumpy
set
nosy: + gumpy
messages: +
2008-12-03 21:48:43
gpolo
set
messages: +
2008-12-01 17:45:17
gpolo
set
messages: +
2008-11-21 14:03:07
gpolo
set
messages: +
2008-11-21 13:40:05
ocean-city
set
files: + just_for_explanation.patch
messages: +
2008-11-21 13:37:39
gpolo
set
messages: +
2008-11-21 13:31:02
ocean-city
set
files: - adhok.patch
2008-11-21 10:36:01
gpolo
set
messages: +
2008-11-21 08:21:54
ocean-city
set
messages: +
2008-11-21 08:19:11
ocean-city
set
files: + adhok.patch
messages: +
2008-11-19 10:27:07
gpolo
set
messages: +
2008-11-19 06:01:21
ocean-city
set
messages: +
2008-11-19 01:19:34
gpolo
set
files: + PythonCmd_check_for_utf.diff
messages: +
2008-11-19 01:19:15
gpolo
set
files: - PythonCmd_check_for_utf.diff
2008-11-19 01:15:36
gpolo
set
files: + PythonCmd_check_for_utf.diff
messages: +
2008-11-19 00:34:18
gpolo
set
nosy: + gpolo
messages: +
2008-11-18 04:38:29
ocean-city
set
status: closed -> open
nosy: + ocean-city
resolution: accepted -> (no value)
messages: +
2008-01-06 22:29:45
admin
set
keywords: - py3k
versions: Python 3.0
2007-09-10 21:28:35
loewis
link
2007-08-27 01:57:04
kbk
set
status: open -> closed
messages: +
2007-08-26 23:40:38
nnorwitz
set
assignee: nnorwitz -> kbk
resolution: accepted
messages: +
nosy: + nnorwitz
2007-08-26 23🔞36
kbk
set
files: + _tkinter.c.patch
messages: +
2007-08-26 23:17:33
kbk
set
files: - _tkinter.c.patch
2007-08-26 20:54:14
kbk
set
messages: +
2007-08-26 20:53:20
kbk
set
files: + _tkinter.c.patch
messages: +
2007-08-26 20:48:31
kbk
create