Issue 18118: curses utf8 output broken in Python2 (original) (raw)

Created on 2013-06-02 10:14 by helmut, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (7)
msg190479 - (view) Author: (helmut) Date: 2013-06-02 10:14
Consider the test case below. <<< #!/usr/bin/python # -*- encoding: utf8 -*- import curses def wrapped(screen): screen.addstr(0, 0, "ä") screen.addstr(0, 1, "ö") screen.addstr(0, 2, "ü") screen.getch() if __name__ == "__main__": curses.wrapper(wrapped) >>> Expected output: "äöü" Output on py3.3: as expected Output on py2.7.3: "?ü" The actual bytes (as determined by strace) were "\303\303\303\274". Observe the inclusion of broken utf8 sequences. This issue was initially discovered on Debian sid, but independently confirmed on Arch Linux and two more unknown.
msg190483 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-06-02 11:08
I believe this is one of a class of bugs that are fixed in Python3, and that are unlikely to be fixed in Python2. I'll defer to Victor, though, who made a number of curses unicode fixes in Python3.
msg190500 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-06-02 20:12
Is your Python curses module linked to libncurses.so.5 or libncursesw.so.5? Example: $ ldd /usr/lib/python2.7/lib-dynload/_cursesmodule.so |grep curses libncursesw.so.5 => /lib/libncursesw.so.5 (0x00375000) libncursesw has a much better support of Unicode than libncurses. Since Python 3.3, the Python curses.window.addstr() method uses waddwstr() when the module is linked to libncursesw, which also improves the Unicode support.
msg190501 - (view) Author: (helmut) Date: 2013-06-02 20:22
All reproducers confirmed that their _cursessomething.so is linked against libncursesw.so.5.
msg190503 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-06-02 20:45
u"äöü" encoded to "utf-8" gives '\xc3\xa4\xc3\xb6\xc3\xbc' "\303\303\303\274" is '\xc3\xc3\xc3\xbc'. I guess that curses considers that '\xc3\xa4' is a string of 2 characters: screen.addstr(0, 1, "ö") replaces the second "character", '\xa4'. I suppose that screen.addstr(0, 0, u"äöü".encode("utf-8")) works. If "_cursessomething.so" is already linked against libncursesw.so.5, the fix is to use waddwstr(), but such change cannot be done in a minor release like Python 2.7.6. So I'm closing this issue as wont fix => you have to move to Python 3.3.
msg190519 - (view) Author: (helmut) Date: 2013-06-03 06:03
> I suppose that screen.addstr(0, 0, u"äöü".encode("utf-8")) works. It works as in "the output looks as the one expected". Long lines with utf8 characters will make it break again though. screen.addstr(0, 0, "äöü" * 20) # assuming COLUMNS=80 Will give two rows of characters of which the first row is 40 characters long. > If "_cursessomething.so" is already linked against libncursesw.so.5, the fix is to use waddwstr(), but such change cannot be done in a minor release like Python 2.7.6. So I'm closing this issue as wont fix => you have to move to Python 3.3. Sounds sensible. Are you aware of a workaround for this issue? I.e. is there any way to force Python2.7 to use the wide mode for outputting characters?
msg190521 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-06-03 07:40
"Sounds sensible. Are you aware of a workaround for this issue? I.e. is there any way to force Python2.7 to use the wide mode for outputting characters?" I don't think that it is possible to workaround this issue, it is a bug in the design of curses, related to Unicode. I suppose that libncursesw uses an array of wchar_t characters when the *_wch() and *wstr() functions are used, whereas your version looks to use an array of char* characters and so is unable to understand that a character is composed of two bytes (ex: b"\xc3\xa4" for u"ä").
History
Date User Action Args
2022-04-11 14:57:46 admin set github: 62318
2013-06-03 07:40:39 vstinner set messages: +
2013-06-03 06:03:34 helmut set messages: +
2013-06-02 20:45:42 vstinner set status: open -> closedresolution: wont fixmessages: +
2013-06-02 20:22:27 helmut set messages: +
2013-06-02 20:12:17 vstinner set messages: +
2013-06-02 11:08:10 r.david.murray set nosy: + vstinner, r.david.murraymessages: + title: curses utf8 output broken -> curses utf8 output broken in Python2
2013-06-02 10:14:02 helmut create