Issue 479898: Multibyte string on string::string_print (original) (raw)

Created on 2001-11-09 07:10 by hyeshik.chang, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python_mbstring_diff.txt hyeshik.chang,2001-11-09 07:10 patch to Objects/stringobject.c
configure.in.diff.txt hyeshik.chang,2001-12-10 03:20 2nd) autoconf detect for mbtowc(), iswprint()
pyconfig.h.in.diff.txt hyeshik.chang,2001-12-10 03:21 2nd) autoconf detect for mbtowc(), iswprint()
stringobject.c.diff.txt hyeshik.chang,2001-12-10 03:22 2nd) new clean(on my view) patch for Objects/stringobject.c
mb3.diff hyeshik.chang,2002-04-01 18:06 3rd) revised (includes patch for stringobject.c, configure.in and pyconfig.h.in)
Messages (10)
msg38131 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2001-11-09 07:10
Many multibyte language users are difficult to see native characters on list or dictionary and etc. This patch allows printing multibyte on UNIX98- compatible machines; mbtowc() is ISO/IEC 9899:1990 standard C-API function.
msg38132 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-11-09 21:21
Logged In: YES user_id=21627 Even though I think this patch is correct in principle, I see a few problems with it: 1. Since it doesn't fix a bug, it probably cannot go into 2.2. 2. There is no autoconf test for mbtowc. You should test this in configure, and then conditionalize your code on HAVE_MBTOWC. 3. There is too much code duplication. Try to find a solution which special-cases the escape codes (\something) only once. For example, you may implement a trivial mbtowc redefinition if mbtowc is not available, and then use mbtowc always.
msg38133 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-12-04 19:08
Logged In: YES user_id=6380 I don't understand the point of using mbtowc() here. The code extracts a wide character, but then it uses isprint() on it, and as far as I know, isprint() is not defined on wide characters, only on 'unsigned char' (and on -1). Isn't what the author wants simply to is isprint(c) instead of (c < ' ' | c >= 0x7f)???
msg38134 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-12-06 15:12
Logged In: YES user_id=21627 You are right, the code should use iswprint instead. The point is that multiple subsequent bytes can make up a single printable character. Not every character above 127 is necessarily printable (e.g. in Latin-1, only characters above 160 are printable). Likewise, a single byte may not be printable, but a combination will print fine. So this code is supposed to catch only those cases where printing will actually work.
msg38135 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2001-12-07 06:38
Logged In: YES user_id=55188 Yes, it should be changed to iswprint on Linux systems. (but, isprint of BSD systems was designed for wide characters) As loewis told, EUC codes of Korea, Japan, Taiwan doesn't use 0x7F-0x9F for printable character. So, I think that using mbtowc is unavoidable.
msg38136 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-12-07 13:21
Logged In: YES user_id=6380 Still, the patch as it exists is unacceptable -- it needs configure support to decide whether to use mbtowc() and whether to use iswprint() or isprint() (I would hope on BSD there is also an iswprint(), to be standard-conforming).
msg38137 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2001-12-10 03:26
Logged In: YES user_id=55188 I uploaded 2nd patches which contains configure support. Unfortunately, Citrus(new generation locale support for *BSDs) didn't implemented iswprint() yet. but *BSDs supports wide character via Rune Locale isprint() func.
msg38138 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2001-12-10 03:38
Logged In: YES user_id=55188 Oops, one mistake. sorry. stringobject.c:646 else if (_ISPRINT(c)) { -> else if (cr > 0 && _ISPRINT(c)) { (to detect whether mbtowc failed to convert)
msg38139 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-10-07 13:58
Logged In: YES user_id=21627 Thanks for the patch, committed as configure 1.343; configure.in 1.354; pyconfig.h.in 1.51; stringobject.c 2.190; I'm not quite sure that your correction is correct: If we invoke iswprint, cr is already guaranteed to be >0, since we otherwise goto nonprintable.
msg38140 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-10-11 05:38
Logged In: YES user_id=21627 The patch was causing too many problems, so I had to back it out.
History
Date User Action Args
2022-04-10 16:04:37 admin set github: 35494
2001-11-09 07:10:11 hyeshik.chang create