[Python-3000] PEP 3138- String representation in Python 3000 (original) (raw)

M.-A. Lemburg mal at egenix.com
Thu May 8 18:52:02 CEST 2008

Previous message: [Python-3000] PEP 3138- String representation in Python 3000
Next message: [Python-3000] PEP 3138- String representation in Python 3000
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2008-05-06 15:55, Atsuo Ishimoto wrote:

(I changed subject)

Thank you for your comment. On Tue, May 6, 2008 at 8:45 PM, M.-A. Lemburg <mal at egenix.com> wrote:

For sys.stdout this doesn't make sense at all, since it hides encoding errors for all applications using sys.stdout as piping mechanism. -1 on that. You can raise UnicodeEncodigError for encoding errors if you want, by setting sys.stdout's error-handler to strict.

No, that's not a good idea. I don't want to change every single affected application just to make sure that they don't write corrupt data to stdout.

Both are really way beyond the scope of the PEP and I don't really see the need for them. Even though this PEP was rejected,

You mean PEP 3138 was rejected ??

I'll still propose to change default error-handler for sys.stdout and for sys.stderr to 'backslashreplace'. For Python 2, 'strict' error-handler is acceptable because most of text data are 8-bit string, but for Py3K, raising exceptions when the printed text contains a character not supported by console is annoying.

Well, "annoying" is not good enough for such a big change :-)

Please also consider the different situations you are addressing:

console output (ie. printing)
stdout file output (ie. piping)
interactive session use (ie. running print at the Python prompt)

The backslashreplace idea may have some merrits in interactive Python sessions or IDLE, but it hides encoding errors in all other situations.

They also don't cover the cases where you write the repr() to a log file, some stream or syslog. Sure. I missed some cases, such as cgitb module or logging module. I'll investigate them later. If you have another candidate, please let me know.

You have to address the general use cases, not just specific implementations in the Python stdlib - those can easily be changed, but doing the same in all the existing code out there that wants to get ported to Py3k is a different issue.

I'm not against changing the repr() of Unicode objects, but please make sure that this change does not break debugging Python applications. Whether you're debugging an app using 'print' statements, piping repr() through a socket to a remote debugger or writing information to a log file. The important factor to take into account is the other end that will receive the data.

BTW: One problem that your PEP doesn't address, which I mentioned on the ticket:

By putting all printable chars into the repr() you lose the ability to actually see the number of code points you have in a Unicode string.

A Unicode-aware editor, shell or pager will display the data as glyphs and not as code points, ie. glyphs expressed using combining code points will appear as one "character" to the user - even though the Unicode object contains multiple code points. As a result, the length and any indexes you might use in the debugging session will not match what the user sees in his shell window.

- Characters defined in the Unicode character database as [snip] This is all very nice, but if that means that the whole Unicode database has to be loaded every time the interpreter starts up as you indicated on the ticket, them I'm firmly -1 against that. I changed a patch to add a flag to the PyUnicodeTypeRecords table, so the Unicode database is not loaded at stat up.

Thanks.

Please name the property Py_UNICODE_ISPRINTABLE. Py_UNICODE_ISHEXESCAPED isn't all that intuitive.

And also add your definition from the PEP to unicodectype.c - since this is not a Unicode standard.

I'd also appreciate if you could make that property available as Unicode method, e.g. .isprintable().

This addition is good on its own.

I proposed to make the Unicode repr() output a regular encoding that's being implemented by a codec. You could then easily change the encoding to whatever you need for your application or console. I think global setting is not flexible enough. And I see no benefit to customizable repr() except to keep compatible with Python 2, but I think it is easy to migrate the existing code to the Py3k.

That's what I don't see in your PEP.

How can things easily be changed so that it's possible to get the Py2.x style hex escaping back into Py3k without having to change all repr() calls and %r format markers for Unicode objects ?

I can see your point with it being easier to read e.g. German, Japanese or Korean data, but it still has to be possible to use repr() for proper debugging which allows the user to actually see what is stored in a Unicode object in terms of code points.

Thanks,

Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, May 08 2008)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
        Registered at Amtsgericht Duesseldorf: HRB 46611

Previous message: [Python-3000] PEP 3138- String representation in Python 3000
Next message: [Python-3000] PEP 3138- String representation in Python 3000
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-3000 mailing list