[Python-3000] PEP 3138- String representation in Python 3000 (original) (raw)

Atsuo Ishimoto ishimoto at gembook.org
Thu May 15 19:50:22 CEST 2008

Previous message: [Python-3000] PEP 3138- String representation in Python 3000
Next message: [Python-3000] PEP 3138- String representation in Python 3000
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, May 16, 2008 at 1:49 AM, Paul Moore <p.f.moore at gmail.com> wrote:

On 15/05/2008, Atsuo Ishimoto <ishimoto at gembook.org> wrote:

I would like to call it "improve", not break :) Please can you help me understand the impact here. I am running Windows XP (UK English - console code page 850, which is some variety of Latin 1). Currently, printing non-latin1 characters gives me an exception: for example,

print("Hello\u03C8") Traceback (most recent call last): File "", line 1, in File "D:\Apps\Python30\lib\io.py", line 1103, in write b = s.encode(self.encoding) File "D:\Apps\Python30\lib\encodings\cp850.py", line 12, in encode return codecs.charmapencode(input,errors,encodingmap) UnicodeEncodeError: 'charmap' codec can't encode character '\u03c8' in position 5: character maps to (This is 3.0a1 - I don't know if much has changed in more recent alphas, if it's significant I can upgrade and try again). Can you explain what I need to change to make sys.stdout behave as you propose? If you can do that, I can test what I will see in your proposal if I type print(repr("Hello\u03C8")). My suspicion is that I will see unreadable garbage, rather than what I currently get, which is backslash-escaped, but readable.

With my proposal, print("Hello\u03C8") prints "Hello\u03C8" instead of raising an exception. And print(repr("Hello\u03C8")) prints "'Hello\u03C8'", so no garbage are printed.

Now, let's say you are Greek and working on Greek version of XP. print("Hello\u03C8") prints "Hello"+collect Greek character(GREEK SMALL LETTER PSI). And print(repr("Hello\u03C8")) prints "'Hello"+collect Greek character+"'". If you have Greek font, you can try this if you swich your command prompt by "chcp 1253" (change codepage to 1253) on your command prompt.

The key point here is that I don't think you're proposing to detect the user's display capabilities and adapt the output to match, so if my display can't cope with the full Unicode character set, I'll have to make manual adjustments or see broken output. Python detects user's capabilities, since Python 2.x(or 1.6? I forgot.) On Windows, Python detects user's encoding from codepage. On Unix, locale is used to detect encoding.

Like it or not, a large proportion of Python's users still work in environments where much of the Unicode character space is not displayed readably.

I agree. So rejecting my proposal as "Not common use-case" might be reasonable. But I should argue to get sympathy, anyway:).

One point I forgot to clarify is that I'm fully aware that print(arbitrarystring) may display garbage, if the string contains Unicode that my display can't handle. The key point for me is that print(repr(arbitrarystring)) is guaranteed to display correctly, even on my limited-capability terminal, precisely because it only uses ASCII and no matter how dumb, all terminals I know of display ASCII.

I can understand your aware. Perhaps you don't want see your terminal flash by escape sequence, beep, endless graphic characters, etc. For legacy byte-string applications(whether written in C or Python), printing arbitrary string can cause such mess. But this is unlikely to happen by printing the Unicode string, since the characters your terminal cannot understand will be escaped or be converted to character such as '?'.

Hope this helps.

Previous message: [Python-3000] PEP 3138- String representation in Python 3000
Next message: [Python-3000] PEP 3138- String representation in Python 3000
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-3000 mailing list