[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5 (original) (raw)

MRAB python at mrabarnett.plus.com
Sat Jan 11 20:22:30 CET 2014

Previous message: [Python-Dev] Smuggling bytes into text (was Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5)
Next message: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2014-01-11 05:36, Steven D'Aprano wrote: [snip]

Latin-1 has the nice property that every byte decodes into the character with the same code point, and visa versa. So:

for i in range(256): assert bytes([i]).decode('latin-1') == chr(i) assert chr(i).encode('latin-1') == bytes([i]) passes. It seems to me that your problem goes away if you use Unicode text with embedded binary data, rather than binary data with embedded ASCII text. Then when writing the file to disk, of course you encode it to Latin-1, either explicitly: pdf = ... # Unicode string containing the PDF contents with open("outfile.pdf", "wb") as f: f.write(pdf.encode("latin-1") or implicitly: with open("outfile.pdf", "w", encoding="latin-1") as f: f.write(pdf) [snip] The second example won't work because you're forgetting about the handling of line endings in text mode.

Suppose you have some binary data bytes([10]).

You convert it into a Unicode string using Latin-1, giving '\n'.

You write it out to a file opened in text mode.

On Windows, that string '\n' will be written to the file as b'\r\n'.

Previous message: [Python-Dev] Smuggling bytes into text (was Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5)
Next message: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list