Error handling of PyUnicode_EncodeDecimal() is broken by design. The caller cannot know the size of the output buffer because each error handler produce a variable output, whereas the caller has to allocate this buffer and it is not possible to specify the size of the output buffer. I propose to raise a ValueError if the error handler is different than "strict" and do this change in Python 2.7, 3.2 and 3.3. In Python 2.7 code base, PyUnicode_EncodeDecimal() is always called with errors=NULL. In Python 3.x, the function is no more called. Attached patch is for Python 3.2. See also the issue #13093.
I'm only using the function with the NULL error handler. If I had to use 'xmlcharrefreplace', presumably I'd overallocate 'output' for the worst case scenario: sizeof("�") per encoded character. It's hard to tell if people are using this feature. PyUnicode_EncodeDecimal() was always undocumented (#8646), but part of the official Unicode API.
> I'm only using the function with the NULL error handler. I don't think that anyone uses it without something else. The function is used to prepare a string input for a function converting a string to an integer. I don't see how xmlcharrefreplace can be useful.
Hum, I only changed PyUnicode_EncodeDecimal in Python 3.3, I prefer to not touch stable releases (2.7, 3.2). New changeset a20fae95618c by Victor Stinner in branch 'default': Close #13093: PyUnicode_EncodeDecimal() doesn't support error handlers http://hg.python.org/cpython/rev/a20fae95618c (Oops, I specified the wrong issue number: fixed in 9a712ad593bb)
History
Date
User
Action
Args
2022-04-11 14:57:24
admin
set
github: 57661
2011-11-25 19:10:17
vstinner
set
status: open -> closedresolution: fixedmessages: +