msg138389 - (view) |
Author: wujek (wujek.srujek) |
Date: 2011-06-15 20:59 |
The following code produces an exception: print('{:c}'.format(65536)) when executed in Idle 3.2. The stack trace: >>> print('{:c}'.format(65536)) Traceback (most recent call last): File "<pyshell#149>", line 1, in print('{:c}'.format(65536)) File "/usr/lib/python3.2/idlelib/PyShell.py", line 1231, in write self.shell.write(s, self.tags) File "/usr/lib/python3.2/idlelib/PyShell.py", line 1213, in write OutputWindow.write(self, s, tags, "iomark") File "/usr/lib/python3.2/idlelib/OutputWindow.py", line 40, in write self.text.insert(mark, s, tags) File "/usr/lib/python3.2/idlelib/Percolator.py", line 25, in insert self.top.insert(index, chars, tags) File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert self.delegate.insert(index, chars, tags) File "/usr/lib/python3.2/idlelib/PyShell.py", line 316, in insert UndoDelegator.insert(self, index, chars, tags) File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 81, in insert self.addcmd(InsertCommand(index, chars, tags)) File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 116, in addcmd cmd.do(self.delegate) File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 219, in do text.insert(self.index1, self.chars, self.tags) File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert self.delegate.insert(index, chars, tags) File "/usr/lib/python3.2/idlelib/WidgetRedirector.py", line 104, in __call__ return self.tk_call(self.orig_and_operation + args) ValueError: unsupported character Seems to work fine in a terminal (Gnome-terminal in this case): >>> print('{:c}'.format(0x10000)) đ (my font doesn't have the glyph, but otherwise it works) Python version: >>> print(sys.version) 3.2 (r32:88445, Mar 25 2011, 19:56:22) [GCC 4.5.2] Os: wujek@home:~$ uname -a Linux studio 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux wujek@home:~$ cat /etc/issue Ubuntu 11.04 |
|
|
msg138390 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2011-06-15 21:10 |
Judging from the stack trace, it isn't str.format that's failing, it's tk failing to display it. |
|
|
msg138392 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-15 21:47 |
U+10000 is not the most common character in fonts. You should try another character in U+10000-U+10FFFF range (non-BMP characters). The new funny emoticon are in this range, but I don't know if your Ubuntu setup includes a font supporting this range. http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F600.pdf |
|
|
msg138395 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2011-06-15 21:59 |
From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl 8.5 (and earlier) does not support Unicode code points outside the BMP range as in this example. I don't think there is anything practical IDLE or tkinter can do about that. |
|
|
msg138397 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-15 22:01 |
> From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl > 8.5 (and earlier) does not support Unicode code points outside > the BMP range as in this example. Extract of http://wiki.tcl.tk/1364 : "RS 2008-07-09: Unicode out of BMP (> U+FFFF) requires a deeper rework of Tcl and Tk: we'd need 32 bit chars and/or surrogate pairs. UTF-8 at least can deal with 31-bit Unicodes by principle." > I don't think there is anything practical IDLE > or tkinter can do about that. We might raise an error with better error message than ValueError('unsupported character'), but it's maybe overkill. |
|
|
msg138402 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2011-06-15 22:17 |
It looks like that error message has been in _tkinter.c since 2002: http://svn.python.org/view/python/trunk/Modules/_tkinter.c?r1=28989&r2=28990&; I suppose it could be slightly more informative but it seems pretty unambiguous to me. Martin, any opinions? |
|
|
msg138497 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-06-17 10:54 |
Instead of ValueError: unsupported character I suggest: ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range What do you think? |
|
|
msg138541 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2011-06-17 18:31 |
>ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range Slightly shorter and without the double :s. ValueError: character U+10000 is above the range (U+0000-U+FFFF) allowed by Tcl/Tk. I agree with a change like this. People are going to increasingly use non-BMP chars and need to find out that the problem is not our fault. |
|
|
msg146663 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2011-10-30 21:46 |
(Merging CC list from duplicate Issue13265. |
|
|
msg146665 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2011-10-30 22:33 |
Changing the error message sounds fine to me. People in need of the feature should lobby their system vendors to provide a Tcl build that uses a 32-bit Tcl_UniChar. Not sure whether it would actually render the string correctly, but at least it would be able to represent it correctly internally. |
|
|
msg146965 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-11-03 19:54 |
Here is the patch as a .patch file. |
|
|
msg146983 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2011-11-03 21:39 |
I'm not sure whether the wording is good English, but apart from that, the patch looks fine. |
|
|
msg146984 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2011-11-03 21:49 |
The patch implements my suggestion. Looking again, I think the English is fine ;-). |
|
|
msg146987 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2011-11-03 22:14 |
You could say "Unicode character ..." in the error to make clear what kind of range is U+0000-U+FFFF (people that are not familiar with Unicode and BMP chars might wonder if that's some tcl/tk thing). |
|
|
msg146991 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-11-03 23:42 |
New changeset 9a07b73abdb1 by Victor Stinner in branch '3.2': Issue #12342: Improve _tkinter error message on unencodable character http://hg.python.org/cpython/rev/9a07b73abdb1 New changeset 5aea95d41ad2 by Victor Stinner in branch 'default': (Merge 3.2) Issue #12342: Improve _tkinter error message on unencodable character http://hg.python.org/cpython/rev/5aea95d41ad2 |
|
|
msg146992 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-11-03 23:49 |
_tkinter now raises ValueError("character U+10ffff is above the range (U+0000-U+FFFF) allowed by Tcl"). > You could say "Unicode character ..." in the error to make clear > what kind of range is U+0000-U+FFFF (people that are not familiar > with Unicode and BMP chars might wonder if that's some tcl/tk thing). I consider that U+10ffff in "character U+10ffff" is enough to specify that it is a Unicode character. Even if you don't understand Unicode, you can at least computer numbers (0x10ffff is not in range [0x0000; 0xFFFF]) ;-) |
|
|
msg146994 - (view) |
Author: Florent Xicluna (flox) *  |
Date: 2011-11-04 00:27 |
Failed to build these modules: (3.3 on Snow Leopard) _tkinter ./cpython/Modules/_tkinter.c: In function âAsObjâ: ./cpython/Modules/_tkinter.c:996: warning: dereferencing âvoid *â pointer ./cpython/Modules/_tkinter.c:996: error: invalid use of void expression |
|
|
msg146999 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-11-04 08:49 |
New changeset 5f49b496d161 by Victor Stinner in branch 'default': Issue #12342: Fix compilation on Mac OS X http://hg.python.org/cpython/rev/5f49b496d161 |
|
|
msg154966 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2012-03-05 17:59 |
In responding to #14200, it occurred to me that better than an exception would be doing what the interpreter does in Command Prompt window, which is expand high chars to '\U0001xxxx' escaped form. |
|
|
msg155414 - (view) |
Author: Roger Serwy (roger.serwy) *  |
Date: 2012-03-11 22:11 |
I agree with Terry. The current behavior of raising ValueError will lead to problems in application code in the future if Tkinter gets fixed such that it can render Unicode properly beyond 0xFFFF. |
|
|
msg155804 - (view) |
Author: Andrew Svetlov (asvetlov) *  |
Date: 2012-03-14 21:48 |
Fixed in #14200 |
|
|
msg155809 - (view) |
Author: Roger Serwy (roger.serwy) *  |
Date: 2012-03-14 22:15 |
Rather than raising a ValueError, would UnicodeEncodeError be more appropriate? I admit that this suggestion may be bike shedding. |
|
|