[Python-Dev] PEP 460 reboot (original) (raw)

Terry Reedy tjreedy at udel.edu
Tue Jan 14 09:32:20 CET 2014

Previous message: [Python-Dev] PEP 460 reboot
Next message: [Python-Dev] PEP 460 reboot
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 1/14/2014 12:03 AM, Guido van Rossum wrote:

On Mon, Jan 13, 2014 at 6:25 PM, Terry Reedy <tjreedy at udel.edu> wrote:

byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',)) b'\x00\x01\x02abcdef' re.split produces [b'\x00', b'', b'\x02', b'', b'def']. The only ascii bias is the one already present is the representation of bytes, and the fact that Python code must have an ascii-compatible encoding. I don't think it's that easy. Just searching for '{' is enough to break in surprising ways

I see your point. The punning problem (between a byte being both itself and a special indicator character) is worse with bytes formats than the similar pun with text, and the potential for mysterious bugs greater. (This is related to why we split 'text' and 'bytes' to begin with.)

With text, we break the pun by doubling the character to escape the special meaning. This works because, 1) % and { are relatively rare in text, 2) %% and {{ are grammatically incorrect, 3) %, {, and especially %% and {{ stand out visually.

With bytes, 1) there is no reason why 37 (%) and 123 ({) should be rare, 2) there is no grammatical rule against the sequences 37, 37 or 123, 123, and 3) hex escapes \x25 and \x7b, which might appear in a bytes format, do not stand out as needing doubling.

My example above breaks if b'\x00' is replaced with b'\x7b'. Even if a doubling and undoubling rule were added, re.split could not be used to split the format bytes.

-- Terry Jan Reedy

Previous message: [Python-Dev] PEP 460 reboot
Next message: [Python-Dev] PEP 460 reboot
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list