Issue 1103023: raw_input problem with readline and UTF8 (original) (raw)
Backspace doesn't remove all bytes of a multi-byte UTF-8 character.
To reproduce the problem: $ export LANG=en_US.UTF-8 $ python Python 2.3.4 (#1, Jun 11 2004, 16:35:29) [GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r3, ssp-3.3-7, pie-8.5.3)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import readline raw_input() # ä, return ä '\xc3\xa4' raw_input() # ä, backspace, return
'\xc3'
A small C program does not have the same problem:
#include <stdlib.h> #include <stdio.h> #include <readline/readline.h> #include <readline/history.h>
void pprint(const char *s);
int main(void) { char *line;
for (;;) {
line = readline("> ");
if (!line)
break;
pprint(line);
free(line);
}
return 0;
}
void pprint(const char *s) { while (*s) { if (isprint(*s)) putchar(*s); else printf("\x%x", *s & 0xff); s++; } putchar('\n'); }
Logged In: YES user_id=65253
Hi, it looks like this might be the same problem already fixed in "[ 914291 ] Fix readline for utf-8 locales", but your python version is from before the fix went in. Can you try again with python 2.4, or 2.3.5 (when it is released, or the release23-maint branch of cvs)?
Also, you could check if python's readline.so is linked to an older version of libreadline.