Issue 19099: struct.pack fails first time with unicode fmt (original) (raw)

Created on 2013-09-27 08:28 by miwa, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
handle_ascii_range_unicode_in_struct_pack.patch vajrasky,2013-09-30 15:29 review
handle_ascii_range_unicode_in_struct_pack_v2.patch vajrasky,2013-10-01 03:52 review
struct_unicode_fmt.patch serhiy.storchaka,2013-10-01 08:27 review
fix_error_message_struct_Struct.patch vajrasky,2013-12-14 04:13 review
Messages (15)
msg198464 - (view) Author: Musashi Tamura (miwa) Date: 2013-09-27 08:28
C:\>python Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import struct >>> struct.pack(u'B',1) Traceback (most recent call last): File "", line 1, in TypeError: Struct() argument 1 must be string, not unicode >>> struct.pack(u'B',1) Traceback (most recent call last): File "", line 1, in TypeError: Struct() argument 1 must be string, not unicode >>> struct.pack('B',1) # this is ok '\x01' >>> struct.pack(u'B',1) '\x01'
msg198708 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-09-30 15:29
Here's the preliminary patch. I am assuming that we should accept unicode argument not reject it straight away. Python3 does that. >>> import struct >>> struct.pack('b', 3) b'\x03' >>> struct.pack(b'b', 3) b'\x03' >>> struct.pack(b'\xff', 3) Traceback (most recent call last): File "", line 1, in struct.error: bad char in struct format
msg198714 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-30 18:23
Struct constructor accepts only str and not unicode. But struct.pack() uses caching and it found Struct('B') in the cache (because u'B' and 'B' are equal and have same hash). I doubt we should fix this. Adding support of Unicode argument is new feature.
msg198716 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-09-30 18:40
Either way, the runtime inconsistency is a bug. Since we shouldn't break existing code, I would vote for always allowing unicode format strings, rather than always disallowing them. Another argument is that str and unicode are generally substituible in 2.x when they are pure ASCII (which they are here).
msg198747 - (view) Author: Musashi Tamura (miwa) Date: 2013-09-30 23:29
Thanks for feedback. I think it should be fixed with allowing unicode. "from __future__ import unicode_literals" may break existing code.
msg198749 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-10-01 03:52
Refactor test to clear the cache before using unicode format.
msg198754 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-01 08:27
struct.Struct() should be changed instead of struct.pack(). Here is a patch.
msg198758 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-10-01 08:57
Serhiy, you don't want to clear the cache in the test to simulate the bug? struct._clearcache() Other than that, should we raise struct.error instead of ValueError? Right now, the current behaviour in python 2.7 is: >>> struct.pack('\x80', 3) Traceback (most recent call last): File "", line 1, in struct.error: bad char in struct format But you are right. Struct() should have been changed instead of struct.pack.
msg198759 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-10-01 09:03
Nevermind about my comment about clearing cache. It only happens if we use struct.pack not struct.Struct.
msg198769 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-01 12:55
> Other than that, should we raise struct.error instead of ValueError? Python 3 raises UnicodeEncodeError. And Python 2 raises UnicodeEncodeError when coerce non-ASCII unicode to str.
msg203063 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-16 16:38
Any comments?
msg205571 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-12-08 15:45
New changeset 42d3afd29460 by Serhiy Storchaka in branch '2.7': Issue #19099: The struct module now supports Unicode format strings. http://hg.python.org/cpython/rev/42d3afd29460
msg205572 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-08 15:48
Fixed. Thank you Musashi for your report.
msg206165 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-12-14 04:13
Okay, I think the error message can be improved because in Python 2.7 we differentiate very clearly the string from the unicode. >>> import struct >>> struct.Struct(1) Traceback (most recent call last): File "", line 1, in TypeError: Struct() argument 1 must be string, not int But you can give unicode, right? >>> struct.Struct(u'b') <Struct object at 0x1f484b8> This is consistent with other example: >>> " cutecat ".strip(1) Traceback (most recent call last): File "", line 1, in TypeError: strip arg must be None, str or unicode What do you say, Serhiy? Here is the patch.
msg206220 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-12-15 09:31
Nevermind, I already created this issue. http://bugs.python.org/issue19985
History
Date User Action Args
2022-04-11 14:57:51 admin set github: 63298
2013-12-15 09:31:39 vajrasky set messages: +
2013-12-14 04:13:50 vajrasky set files: + fix_error_message_struct_Struct.patchmessages: +
2013-12-08 15:48:19 serhiy.storchaka set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2013-12-08 15:45:38 python-dev set nosy: + python-devmessages: +
2013-11-16 16:38:52 serhiy.storchaka set assignee: serhiy.storchakamessages: +
2013-10-01 12:55:19 serhiy.storchaka set messages: +
2013-10-01 09:03:56 vajrasky set messages: +
2013-10-01 08:57:48 vajrasky set messages: +
2013-10-01 08:27:30 serhiy.storchaka set files: + struct_unicode_fmt.patchtype: behavior -> enhancementmessages: + stage: patch review
2013-10-01 03:52:42 vajrasky set files: + handle_ascii_range_unicode_in_struct_pack_v2.patchmessages: +
2013-09-30 23:29:09 miwa set messages: +
2013-09-30 18:40:39 pitrou set messages: +
2013-09-30 18:23:39 serhiy.storchaka set nosy: + pitrou, mark.dickinson, meador.inge, serhiy.storchakamessages: +
2013-09-30 15:29:25 vajrasky set files: + handle_ascii_range_unicode_in_struct_pack.patchnosy: + vajraskymessages: + keywords: + patch
2013-09-27 08:28:07 miwa create