Issue 12342: characters with ord above 65535 fail to display in IDLE (original) (raw)

process

Status: closed Resolution: duplicate
Dependencies: Superseder: Idle shell crash on printing non-BMP unicode character View:14200
Assigned To: asvetlov Nosy List: Ramchandra Apte, asvetlov, eric.smith, ezio.melotti, flox, kbk, loewis, ned.deily, python-dev, r.david.murray, roger.serwy, terry.reedy, vstinner, wujek.srujek
Priority: normal Keywords: patch

Created on 2011-06-15 20:59 by wujek.srujek, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tcl_unicode_range.patch vstinner,2011-11-03 19:54
Messages (22)
msg138389 - (view) Author: wujek (wujek.srujek) Date: 2011-06-15 20:59
The following code produces an exception: print('{:c}'.format(65536)) when executed in Idle 3.2. The stack trace: >>> print('{:c}'.format(65536)) Traceback (most recent call last): File "<pyshell#149>", line 1, in print('{:c}'.format(65536)) File "/usr/lib/python3.2/idlelib/PyShell.py", line 1231, in write self.shell.write(s, self.tags) File "/usr/lib/python3.2/idlelib/PyShell.py", line 1213, in write OutputWindow.write(self, s, tags, "iomark") File "/usr/lib/python3.2/idlelib/OutputWindow.py", line 40, in write self.text.insert(mark, s, tags) File "/usr/lib/python3.2/idlelib/Percolator.py", line 25, in insert self.top.insert(index, chars, tags) File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert self.delegate.insert(index, chars, tags) File "/usr/lib/python3.2/idlelib/PyShell.py", line 316, in insert UndoDelegator.insert(self, index, chars, tags) File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 81, in insert self.addcmd(InsertCommand(index, chars, tags)) File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 116, in addcmd cmd.do(self.delegate) File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 219, in do text.insert(self.index1, self.chars, self.tags) File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert self.delegate.insert(index, chars, tags) File "/usr/lib/python3.2/idlelib/WidgetRedirector.py", line 104, in __call__ return self.tk_call(self.orig_and_operation + args) ValueError: unsupported character Seems to work fine in a terminal (Gnome-terminal in this case): >>> print('{:c}'.format(0x10000)) 𐀀 (my font doesn't have the glyph, but otherwise it works) Python version: >>> print(sys.version) 3.2 (r32:88445, Mar 25 2011, 19:56:22) [GCC 4.5.2] Os: wujek@home:~$ uname -a Linux studio 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux wujek@home:~$ cat /etc/issue Ubuntu 11.04
msg138390 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-06-15 21:10
Judging from the stack trace, it isn't str.format that's failing, it's tk failing to display it.
msg138392 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-15 21:47
U+10000 is not the most common character in fonts. You should try another character in U+10000-U+10FFFF range (non-BMP characters). The new funny emoticon are in this range, but I don't know if your Ubuntu setup includes a font supporting this range. http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F600.pdf
msg138395 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-06-15 21:59
From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl 8.5 (and earlier) does not support Unicode code points outside the BMP range as in this example. I don't think there is anything practical IDLE or tkinter can do about that.
msg138397 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-15 22:01
> From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl > 8.5 (and earlier) does not support Unicode code points outside > the BMP range as in this example. Extract of http://wiki.tcl.tk/1364 : "RS 2008-07-09: Unicode out of BMP (> U+FFFF) requires a deeper rework of Tcl and Tk: we'd need 32 bit chars and/or surrogate pairs. UTF-8 at least can deal with 31-bit Unicodes by principle." > I don't think there is anything practical IDLE > or tkinter can do about that. We might raise an error with better error message than ValueError('unsupported character'), but it's maybe overkill.
msg138402 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-06-15 22:17
It looks like that error message has been in _tkinter.c since 2002: http://svn.python.org/view/python/trunk/Modules/_tkinter.c?r1=28989&r2=28990&; I suppose it could be slightly more informative but it seems pretty unambiguous to me. Martin, any opinions?
msg138497 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-17 10:54
Instead of ValueError: unsupported character I suggest: ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range What do you think?
msg138541 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-06-17 18:31
>ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range Slightly shorter and without the double :s. ValueError: character U+10000 is above the range (U+0000-U+FFFF) allowed by Tcl/Tk. I agree with a change like this. People are going to increasingly use non-BMP chars and need to find out that the problem is not our fault.
msg146663 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-10-30 21:46
(Merging CC list from duplicate Issue13265.
msg146665 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-10-30 22:33
Changing the error message sounds fine to me. People in need of the feature should lobby their system vendors to provide a Tcl build that uses a 32-bit Tcl_UniChar. Not sure whether it would actually render the string correctly, but at least it would be able to represent it correctly internally.
msg146965 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-03 19:54
Here is the patch as a .patch file.
msg146983 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-11-03 21:39
I'm not sure whether the wording is good English, but apart from that, the patch looks fine.
msg146984 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-11-03 21:49
The patch implements my suggestion. Looking again, I think the English is fine ;-).
msg146987 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-11-03 22:14
You could say "Unicode character ..." in the error to make clear what kind of range is U+0000-U+FFFF (people that are not familiar with Unicode and BMP chars might wonder if that's some tcl/tk thing).
msg146991 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-03 23:42
New changeset 9a07b73abdb1 by Victor Stinner in branch '3.2': Issue #12342: Improve _tkinter error message on unencodable character http://hg.python.org/cpython/rev/9a07b73abdb1 New changeset 5aea95d41ad2 by Victor Stinner in branch 'default': (Merge 3.2) Issue #12342: Improve _tkinter error message on unencodable character http://hg.python.org/cpython/rev/5aea95d41ad2
msg146992 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-03 23:49
_tkinter now raises ValueError("character U+10ffff is above the range (U+0000-U+FFFF) allowed by Tcl"). > You could say "Unicode character ..." in the error to make clear > what kind of range is U+0000-U+FFFF (people that are not familiar > with Unicode and BMP chars might wonder if that's some tcl/tk thing). I consider that U+10ffff in "character U+10ffff" is enough to specify that it is a Unicode character. Even if you don't understand Unicode, you can at least computer numbers (0x10ffff is not in range [0x0000; 0xFFFF]) ;-)
msg146994 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2011-11-04 00:27
Failed to build these modules: (3.3 on Snow Leopard) _tkinter ./cpython/Modules/_tkinter.c: In function ‘AsObj’: ./cpython/Modules/_tkinter.c:996: warning: dereferencing ‘void *’ pointer ./cpython/Modules/_tkinter.c:996: error: invalid use of void expression
msg146999 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-04 08:49
New changeset 5f49b496d161 by Victor Stinner in branch 'default': Issue #12342: Fix compilation on Mac OS X http://hg.python.org/cpython/rev/5f49b496d161
msg154966 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-03-05 17:59
In responding to #14200, it occurred to me that better than an exception would be doing what the interpreter does in Command Prompt window, which is expand high chars to '\U0001xxxx' escaped form.
msg155414 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-11 22:11
I agree with Terry. The current behavior of raising ValueError will lead to problems in application code in the future if Tkinter gets fixed such that it can render Unicode properly beyond 0xFFFF.
msg155804 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-03-14 21:48
Fixed in #14200
msg155809 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012-03-14 22:15
Rather than raising a ValueError, would UnicodeEncodeError be more appropriate? I admit that this suggestion may be bike shedding.
History
Date User Action Args
2022-04-11 14:57:18 admin set github: 56551
2012-03-14 22:15:48 roger.serwy set messages: +
2012-03-14 21:48:11 asvetlov set status: open -> closedassignee: asvetlovversions: - Python 2.7, Python 3.2messages: + superseder: Idle shell crash on printing non-BMP unicode characterresolution: fixed -> duplicatestage: commit review -> resolved
2012-03-12 18:52:35 asvetlov set nosy: + asvetlov
2012-03-11 22:11:41 roger.serwy set messages: +
2012-03-05 17:59:56 terry.reedy set messages: +
2011-11-04 08:49:30 python-dev set messages: +
2011-11-04 00:27:38 flox set status: closed -> opennosy: + floxmessages: +
2011-11-03 23:49:35 vstinner set status: open -> closedresolution: fixedmessages: +
2011-11-03 23:42:25 python-dev set nosy: + python-devmessages: +
2011-11-03 22:14:50 ezio.melotti set messages: +
2011-11-03 21:49:02 terry.reedy set messages: + stage: commit review
2011-11-03 21:39:33 loewis set messages: +
2011-11-03 19:54:37 vstinner set files: + tcl_unicode_range.patchkeywords: + patchmessages: +
2011-10-30 22:33:31 loewis set messages: +
2011-10-30 21:46:55 ned.deily set nosy: + kbk, ezio.melotti, roger.serwy, Ramchandra Aptemessages: +
2011-10-30 21:45:57 ned.deily link issue13265 superseder
2011-06-17 18:31:10 terry.reedy set messages: + components: + Tkinter, - IDLE, IOversions: + Python 2.7, Python 3.3
2011-06-17 10:54:44 vstinner set messages: +
2011-06-16 02:20:54 eric.smith set nosy: + eric.smith
2011-06-15 22:17:20 ned.deily set nosy: + loewismessages: +
2011-06-15 22:01:49 vstinner set messages: +
2011-06-15 21:59:06 ned.deily set nosy: + ned.deilymessages: +
2011-06-15 21:47:17 vstinner set nosy: + vstinnermessages: +
2011-06-15 21:10:07 r.david.murray set nosy: + r.david.murray, terry.reedymessages: + title: characters with ord above 65535 fail conversion with str.format for '{:c}' in IDLE -> characters with ord above 65535 fail to display in IDLE
2011-06-15 20:59:59 wujek.srujek create