Issue 9660: PEP 383: socket module doesn't handle undecodable protocol or service names (original) (raw)
The protocol and service/port number databases are typically implemented as text files on Unix and can contain non-ASCII names in any encoding (presumably for local services), but the socket module tries to decode them as strict UTF-8. In particular, getservbyport() and getnameinfo() will raise UnicodeError when this fails.
Attached is a patch for 3.2 to use the file system encoding and surrogateescape handler instead, in line with PEP 383. This is what Python already does for the passwd and group databases, and it will allow protocol and service names to be given correctly as command line arguments.
Come to think of it, I'm not sure if the patch is correct for Windows, as PyUnicode_DecodeFSDefault() appears to do strict MBCS decoding by default (similarly with PyUnicode_FSConverter() for encoding). Can Windows return service names that won't decode with MBCS? Or does it use a different encoding? I don't have a system to experiment with.