[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Sun Jan 12 14:16:37 CET 2014


On 12 Jan 2014 21:53, "Juraj Sukop" <juraj.sukop at gmail.com> wrote:

On Sun, Jan 12, 2014 at 2:35 AM, Steven D'Aprano <steve at pearwood.info> wrote: On Sat, Jan 11, 2014 at 08:13:39PM -0200, Mariano Reingart wrote: > AFAIK (and just for the record), there could be both Latin1 text and UTF-16 > in a PDF (and other encodings too), depending on the font used: [...] > In Python2, txt is just a str, but in Python3 handling everything as latin1 > string obviously doesn't work for TTF in this case. Nobody is suggesting that you use Latin-1 for everything. We're suggesting that you use it for blobs of binary data that represent arbitrary bytes. First you have to get your binary data in the first place, using whatever technique is necessary. Just to check I understood what you are saying. Instead of writing: content = b'\n'.join([ b'header', b'part 2 %.3f' % number, binaryimagedata, utf16string.encode('utf-16be'), b'trailer']) it should now look like: content = '\n'.join([ 'header', 'part 2 %.3f' % number, binaryimagedata.decode('latin-1'), utf16string.encode('utf-16be').decode('latin-1'), 'trailer']).encode('latin-1')

Why are you proposing to do the join in text space? Encode all the parts separately, concatenate them with b'\n'.join() (or whatever separator is appropriate). It's only the text formatting operation that needs to be done in text space and then explicitly encoded (and this example doesn't even need latin-1,ASCII is sufficient):

content = b'\n'.join([
    b'header',
     ('part 2 %.3f' % number).encode('ascii'),
     binary_image_data,
     utf16_string.encode('utf-16be'),
    b'trailer'])

Correct?

My updated version above is the reasonable way to do it in Python 3, and the one I consider clearly superior to reintroducing implicit encoding to ASCII as part of the core text model.

This is why I don't have a problem with PEP 460 as it stands - it's just syntactic sugar for something you can already do with b''.join(), and thus not particularly controversial.

It's only proposals that add any form of implicit encoding that silently switches from the text domain to the binary domain that conflict with the core Python 3 text model (although third party types remain largely free to do whatever they want).

Cheers, Nick.


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140112/6a9fbe0b/attachment.html>



More information about the Python-Dev mailing list