[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?] (original) (raw)

Bengt Richter bokr at oz.net
Sat Feb 11 09:20:27 CET 2006

Previous message: [Python-Dev] release plan for 2.5 ?
Next message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 10 Feb 2006 21:35:26 -0800, Guido van Rossum <guido at python.org> wrote:

On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <nas at arctrix.com> > >The backwards compatibility problems seem to be relatively minor. >I only found one instance of breakage in the standard library. Note >that my patch does not change PyObjectStr(); that would break >massive amounts of code. Instead, I introduce a new function: >PyStringNew(). I'm not crazy about the name but I couldn't think >of anything better. On 2/10/06, Bengt Richter <bokr at oz.net> wrote: Should this not be coordinated with PEP 332? Probably.. But that PEP is rather incomplete. Wanna work on fixing that? I'd be glad to add my thoughts, but first of course it's Skip's PEP, and Martin casts a long shadow when it comes to character coding issues that I suspect will have to be considered.

(E.g., if there is a b'...' literal for bytes, the actual characters of the source code itself that the literal is being expressed in could be ascii or latin-1 or utf-8 or utf16le a la Microsoft, etc. UIAM, I read that the source is at least temporarily normalized to Unicode, and then re-encoded (except now for string literals?) per coding cookie or other encoding inference. (I may be out of date, gotta catch up).

If one way or the other a string literal is in Unicode, then presumably so is a byte string b'...' literal -- i.e. internally u"b'...'" just before being turned into bytes.

Should that then be an internal straight u"b'...'".encode('byte') with default ascii + escapes for non-ascii and non-printables, to define the full 8 bits without encoding error? Should unicode be encodable into byte via a specific encoding? E.g., u'abc'.encode('byte','latin1'), to distinguish producing a mutable byte string vs an immutable str type as with u'abc'.encode('latin1'). (but how does this play with str being able to produce unicode? And when do these changes happen?) I guess I'm getting ahead of myself ;-)

So I would first ask Skip what he'd like to do, and Martin for some hints on reading, to avoid going down paths he already knows lead to brick walls ;-) And I need to think more about PEP 349.

I would propose to do the reading they suggest, and edit up a new version of pep-0332.txt that anyone could then improve further. I don't know about an early deadline. I don't want to over-commit, as time and energies vary. OTOH, as you've noticed, I could be spending my time more effectively ;-)

I changed the thread title, and will wait for some signs from you, Skip, Martin, Neil, and I don't know who else might be interested...

Regards, Bengt Richter

Previous message: [Python-Dev] release plan for 2.5 ?
Next message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list