[Python-3000] Thoughts on new I/O library and bytecode (original) (raw)

Gareth McCaughan gareth.mccaughan at pobox.com
Sat Mar 3 17:02:52 CET 2007

Previous message: [Python-3000] PEP 3113 (Removal of Tuple Parameter Unpacking)
Next message: [Python-3000] Thoughts on new I/O library and bytecode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tuesday 27 February 2007 00:39, Greg Ewing wrote:

I can't help feeling the people arguing for b"..." as the repr format haven't really accepted the fact that text and binary data will be distinct things in py3k, and are thinking of bytes as being a replacement for the old string type. But that's not true -- most of the time, unicode will be the replacement for str when it is used to represent characters, and bytes will mostly be used only for non-text. [etc.]

... but Guido prefers to use b"..." as the repr format, on the grounds that byte-sequences quite often are lightly encoded text, and that when that's true it can be much better to report them as such.

Here's an ugly, impure, but possibly practical answer: give each bytes object a single-bit flag meaning something like "mostly textual"; make the bytes([1,2,3,4]) constructor set it to false, the b"abcde" constructor set it to true, and arbitrary operations on bytes objects do ... well, something plausible :-). (Textuality/non-textuality is generally preserved; combining texual and non-textual yields non-textual.) Then repr() can look at that flag and decide what to do on the basis of it.

This would mean that x==y ==> repr(x)==repr(y) would fail; it can already fail when x,y are of different types (3==3.0; 1==True) and perhaps in some weird situations where they are of the same type (signed IEEE zeros). It would make the behaviour of repr() less predictable, and that's probably bad; it would mean (unlike the examples I gave above) that you can have x==y, with x and y of different types, but have repr(x) and repr(y) not look at all similar.

Obviously the flag wouldn't affect comparisons or hashing.

I can't say I like this much -- it's exactly the sort of behaviour I've found painful in Perl, with too much magic happening behind the scenes for perhaps-insufficient reason -- but it still might be the best available compromise. (The other obvious compromise approach would be to sniff the contents of the bytes object and see whether it "looks" like a lightly-encoded string. That's a bit too much magic for fuzzy reasons too.)

-- Gareth McCaughan

Previous message: [Python-3000] PEP 3113 (Removal of Tuple Parameter Unpacking)
Next message: [Python-3000] Thoughts on new I/O library and bytecode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-3000 mailing list