[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?] (original) (raw)

Greg Ewing greg.ewing at canterbury.ac.nz
Thu Feb 16 00:54:56 CET 2006


Ron Adam wrote:

I was presuming it would be done in C code and it will just need a pointer to the first byte, memchr(), and then read n bytes directly into a new memory range via memcpy().

If the object supports the buffer interface, it can be done that way. But if not, it would seem to make sense to fall back on the iterator protocol.

However, if it's done with a Python iterator and then each item is translated to bytes in a sequence, (much slower), an encoding will need to be known for it to work correctly.

No, it won't. When using the bytes(x) form, encoding has nothing to do with it. It's purely a conversion from one representation of an array of 0..255 to another.

When you do want to perform encoding, you use bytes(u, encoding) and say what encoding you want to use.

Unfortunately Unicode strings don't set an attribute to indicate it's own encoding.

I think you don't understand what an encoding is. Unicode strings don't have an encoding, because theyre not encoded! Encoding is what happens when you go from a unicode string to something else.

Since some longs will be of different length, yes a bytes(0L) could give differing results on different platforms,

It's not just a matter of length. I'm not sure of the details, but I believe longs are currently stored as an array of 16-bit chunks, of which only 15 bits are used. I'm having trouble imagining a use for low-level access to that format, other than just treating it as an opaque lump of data for turning back into a long later -- in which case why not just leave it as a long in the first place.

Greg



More information about the Python-Dev mailing list