The documentation for the 'y' format (PyArg_ParseTuple and friends) states that: « The bytes object must not contain embedded NUL bytes; if it does, a TypeError exception is raised. » But, reading Python/getargs.c, the strlen() check is actually missing in the code for 'y'.
Same issue for y#: y# (...) This variant on s# doesn’t accept Unicode objects, only bytes-like objects. s# (...) The string may contain embedded null bytes. -- y* might mention that it accepts embedded null bytes. -- grep 'PyArg_Parse[^"]\+"[^:;)"]*y[^*]' */*.c finds only usage of y# (no usage of y format): - mmap_gfind(), mmap_write_method() - oss_write(), oss_writeall() - in getsockaddrarg() with s->sock_family==AF_PACKET - in sock_setsockopt() if the option name is a string - socket_inet_ntoa(), socket_inet_ntop() These functions have to support embedded null bytes. So I think that y# should specify explicitly that embedded null bytes are accepted.
See also #8850: Remove "w" format of PyParse_ParseTuple(). -- About "y": the parser HAVE TO check for embedded NUL bytes, because the caller doesn't know the size of the buffer (and strlen() would give the wrong size).
I commited a bigger patch: r81973 not only fixes "y" format, but also "u" and "Z". It does also add a lot of tests in test_getargs2.py for many string formats (not all, eg. "es" is not tested). Even if I consider this as a bugfix, I don't want to backport to 3.1 because it might break programs which rely on this strange behaviour.
History
Date
User
Action
Args
2022-04-11 14:57:00
admin
set
github: 52838
2010-06-13 20:06:19
vstinner
set
status: open -> closedresolution: fixedmessages: +