Issue 1103023: raw_input problem with readline and UTF8 (original) (raw)

Backspace doesn't remove all bytes of a multi-byte UTF-8 character.

To reproduce the problem: $ export LANG=en_US.UTF-8 $ python Python 2.3.4 (#1, Jun 11 2004, 16:35:29) [GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r3, ssp-3.3-7, pie-8.5.3)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

import readline raw_input() # ä, return ä '\xc3\xa4' raw_input() # ä, backspace, return

'\xc3'

A small C program does not have the same problem:

#include <stdlib.h> #include <stdio.h> #include <readline/readline.h> #include <readline/history.h>

void pprint(const char *s);

int main(void) { char *line;

for (;;) {
    line = readline("> ");
    if (!line)
        break;
    pprint(line);
    free(line);
}

return 0;

}

void pprint(const char *s) { while (*s) { if (isprint(*s)) putchar(*s); else printf("\x%x", *s & 0xff); s++; } putchar('\n'); }

Logged In: YES user_id=65253

Hi, it looks like this might be the same problem already fixed in "[ 914291 ] Fix readline for utf-8 locales", but your python version is from before the fix went in. Can you try again with python 2.4, or 2.3.5 (when it is released, or the release23-maint branch of cvs)?

Also, you could check if python's readline.so is linked to an older version of libreadline.