Issue 7267: format method: c presentation type broken in 2.7 (original) (raw)

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: BreamoreBoy, benjamin.peterson, doerwalter, eric.smith, ezio.melotti, francismb, jwilk, python-dev, serhiy.storchaka, terry.reedy, vstinner
Priority: high Keywords: patch

Created on 2009-11-05 16:22 by doerwalter, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue7267.patch francismb,2013-03-23 21:28 review
int_format_c.patch vstinner,2013-07-02 00:34 review
int_format_c_warn.patch serhiy.storchaka,2015-05-13 09:02 review
Messages (23)
msg94935 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2009-11-05 16:22
The c presentation type in the new format method from PEP 3101 seems to be broken: Python 2.6.4 (r264:75706, Oct 27 2009, 15🔞04) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> u'{0:c}'.format(256) u'\x00' The PEP states: 'c' - Character. Converts the integer to the corresponding Unicode character before printing, so I would have expected this to return u'\u0100' instead of u'\x00'.
msg94936 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-05 16:30
I'll look at it.
msg94969 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-06 14:09
This is a bug in the way ints and longs are formatted. They always do the formatting as str, then convert to unicode. This works everywhere except with the 'c' presentation type. I'm still trying to decide how best to handle this.
msg94972 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2009-11-06 14:52
I'd say that a value >= 128 should generate a Unicode string (as the PEP explicitely states that the value is a Unicode code point and not a byte value). However str.format() doesn't seem to support mixing str and unicode anyway: >>> '{0}'.format(u'\u3042') UnicodeEncodeError: 'ascii' codec can't encode character u'\u3042' in position 0: ordinal not in range(128) so str.format() might raise an OverflowError for values >= 128 (or >= 256?)
msg95113 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009-11-10 13:20
> so str.format() might raise an OverflowError for values >= 128 (or >= 256?) Maybe, but the issue you reported is in unicode.format() (not str.format()), and I think that should be fixed. I'm trying to think of how best to address it. As for the second issue you raise (which I think is that str.format() can't take a unicode argument), would you mind opening a separate issue for this and assigning it to me? Thanks.
msg95115 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2009-11-10 13:58
Done: issue 7300.
msg98107 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-01-21 11:38
See also issue #7649.
msg98173 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-01-23 00:46
('%c' % 255) == chr(255) == '\xff' '%c' % 256 raise an "OverflowError: unsigned byte integer is greater than maximum" and chr(256) raise a "ValueError: chr() arg not in range(256)". I prefer the second error ;-) str.format() should follow the same behaviour. str is a byte string: it can be used to create a network packet or encode data into a byte stream. '%c' is useful for that, and str.format() should keep this nice feature.
msg100772 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-03-10 00:25
u'{0:c}'.format(256) formatter in implemented in Objects/stringlib/formatter.h and this C template is instanciated in... Python/formatter_string.c (and not Python/formatter_unicode.c). Extract of formatter_unicode.c comment: /* don't define FORMAT_LONG, FORMAT_FLOAT, and FORMAT_COMPLEX, since we can live with only the string versions of those. The builtin format() will convert them to unicode. */ format_int_or_long_internal() is instanciated (only once) with STRINGLIB_CHAR=char and so "numeric_char = (STRINGLIB_CHAR)x;" becomes "numeric_char = (char)x;" whereas x is a long in [0; 0x10ffff] (or [0; 0xffff] depending on Python unicode build option). I think that 'c' format type should have its own function because format_int_or_long_internal() gets locale info, compute the number of digits, and other things not related to just creating one character from its code (chr(code) / unichr(code)). But it's just a remark, it doesn't fix this issue. To fix this issue, I think that the FORMAT_LONG & cie templates should be instanciated twice (str & unicode).
msg185089 - (view) Author: Francis MB (francismb) * Date: 2013-03-23 20:52
In 2.7.3 >>> >>> u'{0:c}'.format(127) u'\x7f' >>> u'{0:c}'.format(128) Traceback (most recent call last): File "<pyshell#6>", line 1, in u'{0:c}'.format(128) UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128) >>> u'{0:c}'.format(255) Traceback (most recent call last): File "<pyshell#7>", line 1, in u'{0:c}'.format(255) UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128) >>> u'{0:c}'.format(256) u'\x00' >>> u'{0:c}'.format(257) u'\x01'
msg185092 - (view) Author: Francis MB (francismb) * Date: 2013-03-23 21:28
Adding a test that triggers the issue, let me know if is enough.
msg192169 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-07-02 00:34
u'{0:c}'.format(256) calls 256.__format__('c') which returns a str (bytes) object, so we must reject value outside range(0, 256). The real fix for this issue is to upgrade to Python 3. Attached patch works around the inital issue (u'{0:c}'.format(256)) by raising OverflowError on int.__format__('c') if the value is not in range(0, 256).
msg217674 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-05-01 01:18
If the purpose of backporting .format was/is to help people writing forward-looking code, or now, to write 2&3 code, then it should work like .format in 3.x, at lease when the format string is unicode.
msg242726 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-05-07 19:07
What if any harm can be done by applying the patch with Victor's work around?
msg242753 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-08 10:33
May be just emit a warning in -3 mode?
msg243059 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-13 09:02
Here is a modification of Victor's patch, that just emits Py3k warning. Both ways, with OverflowError and Py3k DeprecationWarning, are good to me. What would you say about this Benjamin?
msg254373 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-09 08:59
Ping.
msg254376 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-09 10:14
> Both ways, with OverflowError and Py3k DeprecationWarning, are good to me. What would you say about this Benjamin? I prefer an OverflowError. I don't like having to enable a flag to fix a bug :-( According to the issue title, it's really a bug: "format method: c presentation type *broken* in 2.7". Note: The unit test may check the error message, currently the error message is irrevelant (it mentions unicode whereas bytes (str type) are used). >>> format(-1, "c") OverflowError: %c arg not in range(0x110000) (wide Python build)
msg254378 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-09 10:51
Then feel free to commit your patch please. It LGTM.
msg254379 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-09 11:22
New changeset 2f2c52c9ff38 by Victor Stinner in branch '2.7': Issue #7267: format(int, 'c') now raises OverflowError when the argument is not https://hg.python.org/cpython/rev/2f2c52c9ff38
msg254380 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-09 11:23
> Then feel free to commit your patch please. It LGTM. Thanks for the review ;-) @Walter: Sorry for the late fix (6 years later!).
msg254383 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2015-11-09 12:38
Don't worry, I've switched to using Python 3 in 2012, where this isn't a problem. ;)
msg254391 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-09 15:29
Walter Dörwald added the comment: > Don't worry, I've switched to using Python 3 in 2012, where this isn't a problem. ;) Wow, cool! We still have 1 or 2 customers stuck with Python 2, haha.
History
Date User Action Args
2022-04-11 14:56:54 admin set github: 51516
2015-11-09 19:16:53 berker.peksag set stage: commit review -> resolved
2015-11-09 15:29:28 vstinner set messages: +
2015-11-09 12:38:45 doerwalter set messages: +
2015-11-09 11:23:06 vstinner set status: open -> closedresolution: fixedmessages: +
2015-11-09 11:22:22 python-dev set nosy: + python-devmessages: +
2015-11-09 10:51:06 serhiy.storchaka set messages: + stage: patch review -> commit review
2015-11-09 10:14:57 vstinner set messages: +
2015-11-09 08:59:00 serhiy.storchaka set messages: +
2015-06-10 18:54:25 jwilk set nosy: + jwilk
2015-05-19 09:19:01 serhiy.storchaka set nosy: + benjamin.peterson
2015-05-13 09:02:11 serhiy.storchaka set files: + int_format_c_warn.patchmessages: +
2015-05-08 10:33:20 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2015-05-07 19:07:48 BreamoreBoy set nosy: + BreamoreBoymessages: +
2014-05-01 01:35:46 terry.reedy set title: format method: c presentation type broken -> format method: c presentation type broken in 2.7
2014-05-01 01🔞37 terry.reedy set nosy: + terry.reedymessages: + stage: needs patch -> patch review
2013-07-02 00:34:30 vstinner set files: + int_format_c.patchmessages: +
2013-06-23 14:57:49 terry.reedy set stage: test needed -> needs patch
2013-03-23 21:28:00 francismb set files: + issue7267.patchkeywords: + patchmessages: +
2013-03-23 20:52:57 francismb set nosy: + francismbmessages: +
2011-11-19 14:03:05 ezio.melotti set versions: - Python 2.6
2010-03-10 00:25:42 vstinner set messages: +
2010-02-24 18:25:05 eric.smith set priority: normal -> high
2010-02-24 18:04:15 eric.smith set priority: normal
2010-01-23 00:46:34 vstinner set messages: +
2010-01-21 11:38:47 vstinner set nosy: + vstinnermessages: +
2010-01-14 00:11:48 ezio.melotti set nosy: + ezio.melottistage: test needed
2009-11-10 13:58:23 doerwalter set messages: +
2009-11-10 13:20:17 eric.smith set messages: +
2009-11-06 14:52:30 doerwalter set messages: +
2009-11-06 14:09:20 eric.smith set messages: + versions: + Python 2.7
2009-11-05 16:30:22 eric.smith set assignee: eric.smithmessages: + nosy: + eric.smith
2009-11-05 16:22:47 doerwalter create