msg165688 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-07-17 08:15 |
Yet one inconsistency in charmap codec. >>> import codecs >>> codecs.charmap_decode(b'\x00', 'strict', '\U0002000B') ('𠀋', 1) >>> codecs.charmap_decode(b'\x00', 'strict', {0: '\U0002000B'}) ('𠀋', 1) >>> codecs.charmap_decode(b'\x00', 'strict', {0: 0x2000B}) Traceback (most recent call last): File "", line 1, in TypeError: character mapping must be in range(65536) The suggested patch removes this unnecessary limitation in charmap decoder. |
|
|
msg165690 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-07-17 08:54 |
Could you add a test to your patch? Is the issue 3.3-specific? |
|
|
msg165710 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-07-17 11:36 |
Fixing for 3.2 and lesser is possible, but expensive, because of narrow build limitation. If necessary, I will give the patch, but it is easier to mark it as "wont fix" for pre-3.3 versions. Here is a tests for charmap decoding. Tests added not only for this issue, but for all non-covered cases with int2str and int2str mappings. |
|
|
msg165753 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2012-07-18 11:02 |
In 3.2, narrow build is also broken when the "charmap" is a string: >>> codecs.charmap_decode(b'\0', 'strict', '\U0002000B') returns ('𠀋', 1) with a wide unicode build, but ('\ud840', 1) with a narrow build. 3.2 could be fixed to allow characters up to sys.maxunicode, though. |
|
|
msg165786 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-07-18 15:48 |
Well, here is a patch for 3.2. |
|
|
msg165796 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2012-07-18 20:26 |
About the patch for 3.2: "needed = 6 - extrachars" Where does this 6 come from? There is another part which uses this "extrachars". Why is the allocation strategy different here? |
|
|
msg165798 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-07-18 20:33 |
It's the same strategy. "needed = (targetsize - extrachars) + (targetsize << 2)". targetsize == 2. |
|
|
msg165801 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2012-07-18 21:07 |
Ah, I was worried by the possible quadratic behavior. So the other (existing) case is quadratic as well (I was mislead by the <<, which made me think there is something clever there). That's good enough for 3.2, I guess. |
|
|
msg170567 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-09-16 18:43 |
Ping. |
|
|
msg170913 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-09-21 20:14 |
Patches updated. Added a few new tests, used MAX_UNICODE, a little changed extrachars grow step. |
|
|
msg171069 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2012-09-23 18:01 |
New changeset 620d23f7ad41 by Antoine Pitrou in branch '3.2': Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings). http://hg.python.org/cpython/rev/620d23f7ad41 New changeset c64dec45d46f by Antoine Pitrou in branch 'default': Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings). http://hg.python.org/cpython/rev/c64dec45d46f |
|
|
msg171070 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-09-23 18:02 |
Thank you, I've committed the patches. There was a test failure in test_codeccallbacks in 3.2, which I fixed simply by replacing sys.maxunicode with a hardcoded 0x110000. |
|
|
msg171814 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-10-02 16:39 |
We forgot about 2.7 (because I had not thought to apply it even for a 3.2). Here is backported patch. |
|
|
msg173356 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-10-19 19:05 |
The 2.7 patch is just a backport of 3.2 patch (including the last Antoine's fix). Please look and commit. |
|
|
msg175802 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2012-11-17 20:17 |
New changeset c7ce91756472 by Antoine Pitrou in branch '2.7': Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings). http://hg.python.org/cpython/rev/c7ce91756472 |
|
|
msg175803 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-11-17 20:17 |
Thanks for the backport, committed! |
|
|