[Python-Dev] Bytes path support (original) (raw)

Chris Angelico rosuav at gmail.com
Fri Aug 22 23:04:20 CEST 2014


On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman <v+python at g.nevcal.com> wrote:

"cp1251 of utf-8 encoding" is non-sensical. Either it is cp1251 or it is utf-8, but it is not both. Maybe you meant "or" instead of "of".

I'd assume "or" meant there, rather than "of", it's a common typo.

Not sure why 1251, specifically, but it's not uncommon for boundary code to attempt a decode that consists of something like "attempt UTF-8 decode, and if that fails, attempt an eight-bit decode". For my MUD clients, that's pretty much required; one of the servers I frequent is completely bytes-oriented, so whatever encoding one client uses will be dutifully echoed to every other client. There are some that correctly use UTF-8, but others use whatever they feel like; and since those naughty clients are mainly on Windows, I can reasonably guess that they'll be using CP-1252. So that's what I do: UTF-8, fall-back on 1252. (It's also possible some clients will be using Latin-1, but 1252 is a superset of that.)

But it's important to note that this is a method of handling junk. It's not a design intention; this is for a situation where I really want to cope with any byte stream and attempt to display it as text. And if I get something that's neither UTF-8 nor CP-1252, I will display it wrongly, and there's nothing can be done about that.

ChrisA



More information about the Python-Dev mailing list