[Python-Dev] methods on the bytes object (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Sun Apr 30 20:52:02 CEST 2006

Previous message: [Python-Dev] methods on the bytes object
Next message: [Python-Dev] PEP 3101: Advanced String Formatting
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Josiah Carlson wrote:

I think what you are missing is that algorithms that currently operate on byte strings should be reformulated to operate on character strings, not reformulated to operate on bytes objects. By "character strings" can I assume you mean unicode strings which contain data, and not some new "character string" type?

I mean unicode strings, period. I can't imagine what "unicode strings which do not contain data" could be.

I know I must have missed some conversation. I was under the impression that in Py3k:

Python 1.x and 2.x str -> mutable bytes object

No. Python 1.x and 2.x str -> str, Python 2.x unicode -> str In addition, a bytes type is added, so that Python 1.x and 2.x str -> bytes

The problem is that the current string type is used both to represent bytes and characters. Current applications of str need to be studied, and converted appropriately, depending on whether they use "str-as-bytes" or "str-as-characters". The "default", in some sense of that word, is that str applications are assumed to operate on character strings; this is achieved by making string literals objects of the character string type.

I was also under the impression that str.encode(...) -> bytes, bytes.decode(...) -> str

Correct.

and that there would be some magical argument to pass to the file or open open(fn, 'rb', magicalparameter).read() -> bytes.

I think the precise details of that are still unclear. But yes, the plan is to have two file modes: one that returns character strings (type 'str') and one that returns type 'bytes'.

I mention this because I do binary data handling, some ''.join(...) for IO buffers as Guido mentioned (because it is the fastest string concatenation available in Python 2.x), and from this particular conversation, it seems as though Python 3.x is going to lose some expressiveness and power.

You certainly need a "concatenate list of bytes into a single bytes". Apparently, Guido assumes that this can be done through bytes().join(...); I personally feel that this is over-generalization: if the only practical application of .join is the empty bytes object as separator, I think the method should be omitted.

Perhaps

bytes(...)

or bytes.join(...)

could work?

Regards, Martin

Previous message: [Python-Dev] methods on the bytes object
Next message: [Python-Dev] PEP 3101: Advanced String Formatting
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list