Issue 8373: socket: AF_UNIX socket paths not handled according to PEP 383 (original) (raw)
In 3.x, the socket module assumes that AF_UNIX addresses use UTF-8 encoding - this means, for example, that accept() will raise UnicodeDecodeError if the peer socket path is not valid UTF-8, which could crash an unwary server.
Python 3.1.2 (r312:79147, Mar 23 2010, 19:02:21) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from socket import * s = socket(AF_UNIX, SOCK_STREAM) s.bind(b"\xff") s.getsockname() Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte
I'm attaching a patch to handle socket paths according to PEP 383. Normally this would use PyUnicode_FSConverter, but there are a couple of ways in which the address handling currently differs from normal filename handling.
One is that embedded null bytes are passed through to the system instead of being rejected, which is needed for the Linux abstract namespace. These abstract addresses are returned as bytes objects, but they can currently be specified as strings with embedded null characters as well. The patch preserves this behaviour.
The current code also accepts read-only buffer objects (it uses the "s#" format), so in order to accept these as well as bytearray filenames (which the posix module accepts), the patch simply accepts any single-segment buffer, read-only or not.
This patch applies on top of the patches I submitted for issue #8372 (rather than knowingly running past the end of sun_path).
Updated the patches for Python 3.2 - these are now simpler as they do not support bytearray arguments, as these are no longer used for filenames (the existing code does not support bytearrays either).
I've put the docs and tests in one patch, and made separate patches for the code, one for if the linux-pass-unterminated patch from issue #8372 is applied, and one for if it isn't.
One point I neglected to comment on before is the ability to specify an address in the Linux abstract namespace as a filesystem-encoded string prefixed with a null character. This may seem strange, but as well as simplifying the code, it does support an actual use case, as on Linux systems the abstract namespace is sometimes used to hold names based on real filesystem paths such as "\x00/var/run/hald/dbus-XAbemUfDyQ", or imaginary ones, such as "\x00/com/ubuntu/upstart". In fact, running "netstat" on my own system did not reveal any non-textual abstract names in use (although they are of course allowed).