Inputing some Unicode characters (like 'łąśćńó...') causes interactive session to abort. When console session is set to use UTF-8 code page (65001) after diacritic character appears in string the session abruptly ends. Looking into debug output it looks like some cleanup is performed but there are no error messages indicating what caused problem. Problem spotted on Windows 10 (technical preview) but I may try to replicate it on some released operating system. --- C:\>chcp 1250 Active code page: 1250 C:\>python -i Python 3.4.2 (v3.4.2:ab2c023a9432, Oct 6 2014, 22:15:05) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> 'ł' 'ł' >>> exit() C:\>chcp 65001 Active code page: 65001 C:\>python -i Python 3.4.2 (v3.4.2:ab2c023a9432, Oct 6 2014, 22:15:05) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> 'ł' C:\
This issue looks to be a duplicate of the issue #1602: windows console doesn't print or input Unicode. It's a limitation of Windows, not of Python itself. Python supports any Unicode character if the output is written in a file (encoded in UTF-8). Workaround: use IDLE or another Python "REPL" (interactive interpreter) which has a better Unicode support.
This isn't a Python bug. The Windows console doesn't properly support UTF-8. See issue 1602 and Drekin's win-unicode-console, an alternative REPL based on the wide-character (UCS-2) console API. FWIW, I attached a debugger to conhost.exe under Windows 7 to inspect what's happening here. In the client, the CRT's read() function calls WinAPI ReadFile. For a console handle this calls either ReadConsoleA or (in Windows 8+) NtReadFile. Either way, most of the action happens in the server process, conhost.exe. The server's input buffer is Unicode, which gets encoded to CP 65001 (UTF-8) by calling WideCharToMultibyte. However the server incorrectly assumes the current codepage is a Windows ANSI codepage with a one-to-one mapping, i.e. that each 16-bit wchar_t maps to an 8-bit char in the current codepage. Since 'ł' gets UTF-8 encoded as the two-byte string b'\xc5\x82', the allocated buffer is too small by a byte. The server doesn't recover from this failure by allocating a larger buffer. It just reports back to the client process that it read 0 bytes. The CRT in turn sets the end-of-file (EOF) flag on the stdin FILE stream, which causes Python to exit 'normally'.