[Python-3000] Immutable bytes -- looking for volunteer (original) (raw)
Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Wed Sep 26 22:00:56 CEST 2007
- Previous message: [Python-3000] Immutable bytes -- looking for volunteer
- Next message: [Python-3000] Immutable bytes -- looking for volunteer
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dnia 25-09-2007, Wt o godzinie 17:22 -0700, Guido van Rossum napisał(a):
OK. Though it's questionable even whether a slice of a mutable bytes object should return a mutable bytes object (as it is not a shared view). But as that is what PyBytes currently do it is certainly the easiest...
A slice of a list is a list, as it always have been, so letting slicing return the same type as the whole sequence is at least consistent and easy to explain. Hard to say though what are typical use cases.
OTOH I believe individual elements of mutable or immutable bytes should be ints. Here is why I think that the analogy between characters and bytes is not strong enough to let elements of bytes be bytes of length 1 just because strings do the same.
Bytes are often computed, while characters are often only copied from place to place. Arithmetic is defined on ints, but not on bytes sequences of length 1. This means that computing a bytes sequence from scratch requires explicit conversions between a byte represented by an int and a byte represented by bytes of length 1.
There is also a philosophical reason. The division of a string into characters is quite arbitrary: considering UTF-16/UTF-32, combining characters, the encoding of Hangul, orthography peculiarities, proportional fonts, ligatures, variant selectors etc. — all of these obscuring the concept of a character and of string length, and considering that a sequence of characters might have been decoded from or will be encoded into a sequence of bytes with a different length. This means that having atomic string components is more a technical convenience than a fundamental necessity, that the very concept of a character in a Unicode world is arbitrary, and the length of a string is more a technical detail of a representation than an inherent property of the text being represented. All this means that the concept of a string is more fundamental than a character.
OTOH a byte count and byte offsets are usually important in protocols based on bytes (except text files when they encode human text). The individual bytes are in some sense delimited very sharply from each other, the amount of information stored in one byte is very well defined. A single byte is a more important concept in a bytes world than a character in a text world, it's not merely a sequence with length 1.
Having characters different from strings would require creation of a new type, because the existing int type is not very appropriate for single characters, because many properties differ, e.g. the effect of writing to a text file. To avoid the burden of creating a new type for a concept which is rarely useful in isolation, strings of length 1 have been reused. OTOH the existing int type seems appropriate for elements of bytes. They can be easily thought of as just integers in the range 0..255, and Python does not use separate integer types for different potential ranges.
If you really don't like ints there, I would prefer immutable bytes even as elements of mutable bytes. This is just a value isomorphic to an int, not an object with its own state. Moreover for atomic objects like individual bytes mutability is not helpful to obtain performance, which would be a reason to use a mutable type for non-atomic objects even when conceptually they are identityless values (mutability often helps in such case because an object can be constructed piece by piece).
-- _("< Marcin Kowalczyk _/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/
- Previous message: [Python-3000] Immutable bytes -- looking for volunteer
- Next message: [Python-3000] Immutable bytes -- looking for volunteer
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]