[Python-Dev] Smuggling bytes into text (was Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5) (original) (raw)
Steven D'Aprano steve at pearwood.info
Mon Jan 13 03:21:25 CET 2014
- Previous message: [Python-Dev] Smuggling bytes into text (was Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5)
- Next message: [Python-Dev] Smuggling bytes into text (was Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Jan 13, 2014 at 01:03:15PM +1100, Steven D'Aprano wrote:
code speaks louder than words: http://www.pearwood.info/ethandemo.py
[...]
Ethan refers to code like:
template % ("срЃ".encode('cp1251').decode('latin-1'), 42, blob.decode('latin-1'))
> You did say to use a text template to manipulate my data, and then write > it later, no? Well, this is what it would look like.
If the text strings the user gives you are compatible with the encoding they specify, you don't need that. Just use: ("срЃ", 42, blob.decode('latin-1')) It's the user's responsibility if they choose to specify an encoding which is more restrictive than the contents of some field. If they do that, they have to encode that field somehow, so they can treat it as a binary blob. You don't have to do this, and you certainly don't have to take perfectly good text and turn it into bytes then back to text just so you can insert it back into text. That would be silly.
It occurs to me that I do exactly that in my demo code :-)
In my defence, it was 1am when I wrote it, and I am a little unclear about Nathan's use-case whether the entire file is supposed to be compatible with the cp1251 encoding (the example that he gives), or just individual fields in it. If I understood the requirements better, my code would probably be able to avoid some of those encodes/decodes, or I might even decide that working in the text domain is a mistake and instead we should look to smuggle text into bytes rather than the other way around.
Regardless of which way you go, I'm not seeing that mixed bytes and text should be a reason to hold off migrating from 2 to 3. Which is where this discussion started days and days ago.
-- Steven
- Previous message: [Python-Dev] Smuggling bytes into text (was Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5)
- Next message: [Python-Dev] Smuggling bytes into text (was Re: RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]