[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 (original) (raw)

Juraj Sukop juraj.sukop at gmail.com
Sat Jan 11 13:56:56 CET 2014


On Sat, Jan 11, 2014 at 6:36 AM, Steven D'Aprano <steve at pearwood.info>wrote:

I'm sorry, I don't understand what you mean here. I'm honestly not trying to be difficult, but you sound confident that you understand what you are doing, but your description doesn't make sense to me. To me, it looks like you are conflating bytes and ASCII characters, that is, assuming that characters "are" in some sense identical to their ASCII representation. Let me explain: The integer that in English is written as 100 is represented in memory as bytes 0x0064 (assuming a big-endian C short), so when you say "an integer is written down AS-IS" (emphasis added), to me that says that the PDF file includes the bytes 0x0064. But then you go on to write the three character string "100", which (assuming ASCII) is the bytes 0x313030. Going from the C short to the ASCII representation 0x313030 is nothing like inserting the int "as-is". To put it another way, the Python 2 '%d' format code does not just copy bytes.

Sorry, I should've included an example: when I said "as-is" I meant "1", "0", "0" so that would be yours "0x313030."

If you consider PDF as binary with occasional pieces of ASCII text, then working with bytes makes sense. But I wonder whether it might be better to consider PDF as mostly text with some binary bytes. Even though the bulk of the PDF will be binary, the interesting bits are text. E.g. your example:

Even though the binary image data is probably much, much larger in length than the text shown above, it's (probably) trivial to deal with: convert your image data into bytes, decode those bytes into Latin-1, then concatenate the Latin-1 string into the text above.

This is similar to what Chris Barker suggested. I also don't try to be difficult here but please explain to me one thing. To treat bytes as if they were Latin-1 is bad idea, that's why "%f" got dropped in the first place, right? How is it then alright to put an image inside an Unicode string?

Also, apart from the in/out conversions, do any other difficulties come to your mind?

Please also take note that in Python 3.3 and better, the internal

representation of Unicode strings containing only code points up to 255 (i.e. pure ASCII or pure Latin-1) is very efficient, using only one byte per character.

I guess you meant [C]Python...

In any case, thanks for the detailed reply. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140111/49ab687d/attachment.html>



More information about the Python-Dev mailing list