[Python-Dev] Bytes path support (original) (raw)
Chris Angelico rosuav at gmail.com
Fri Aug 22 23:04:20 CEST 2014
- Previous message: [Python-Dev] Bytes path support
- Next message: [Python-Dev] Bytes path support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman <v+python at g.nevcal.com> wrote:
"cp1251 of utf-8 encoding" is non-sensical. Either it is cp1251 or it is utf-8, but it is not both. Maybe you meant "or" instead of "of".
I'd assume "or" meant there, rather than "of", it's a common typo.
Not sure why 1251, specifically, but it's not uncommon for boundary code to attempt a decode that consists of something like "attempt UTF-8 decode, and if that fails, attempt an eight-bit decode". For my MUD clients, that's pretty much required; one of the servers I frequent is completely bytes-oriented, so whatever encoding one client uses will be dutifully echoed to every other client. There are some that correctly use UTF-8, but others use whatever they feel like; and since those naughty clients are mainly on Windows, I can reasonably guess that they'll be using CP-1252. So that's what I do: UTF-8, fall-back on 1252. (It's also possible some clients will be using Latin-1, but 1252 is a superset of that.)
But it's important to note that this is a method of handling junk. It's not a design intention; this is for a situation where I really want to cope with any byte stream and attempt to display it as text. And if I get something that's neither UTF-8 nor CP-1252, I will display it wrongly, and there's nothing can be done about that.
ChrisA
- Previous message: [Python-Dev] Bytes path support
- Next message: [Python-Dev] Bytes path support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]