Issue 16979: Broken error handling in codecs.unicode_escape_decode() (original) (raw)

Created on 2013-01-16 10:46 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_escape_decode_error_handling-2.7.patch serhiy.storchaka,2013-01-25 22:58 review
unicode_escape_decode_error_handling-3.2.patch serhiy.storchaka,2013-01-25 22:58 review
unicode_escape_decode_error_handling-3.3.patch serhiy.storchaka,2013-01-25 22:58 review
unicode_escape_decode_error_handling-3.4.patch serhiy.storchaka,2013-01-25 22:58 review
Messages (7)
msg180077 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-16 10:46
An error handler in unicode_escape_decode() eats at least one byte (or more) after illegal escape sequence. >>> import codecs >>> codecs.unicode_escape_decode(br'\u!@#', 'replace') ('�', 5) >>> codecs.unicode_escape_decode(br'\u!@#$', 'replace') ('�@#$', 6) raw_unicode_escape_decode() works right: >>> codecs.raw_unicode_escape_decode(br'\u!@#', 'replace') ('�!@#', 5) >>> codecs.raw_unicode_escape_decode(br'\u!@#$', 'replace') ('�!@#$', 6) See also .
msg180091 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-16 14:50
Here is a patch for 3.4. Patches for other versions will be different a lot.
msg180634 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-25 22:58
Here is a set of patches for all versions (patch for 3.4 updated).
msg180857 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-28 14:20
Ezio, is it a good factorization? def check(self, coder): def checker(input, expect): self.assertEqual(coder(input), (expect, len(input))) return checker def test_escape_decode(self): decode = codecs.unicode_escape_decode check = self.check(decode) check(b"[\\\n]", "[]") check(br'[\"]', '["]') check(br"[\']", "[']") # other 20 checks ... And same for test_escape_encode and for bytes escape decoder.
msg180890 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-01-29 00:33
LGTM. If you want to push it even further you could make a list of (input, expected) and call the check() in a loop. That way it will also be easier to refactor if/when we add subtests (#16997).
msg180896 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-29 08:53
New changeset a242ac99161f by Serhiy Storchaka in branch '2.7': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/a242ac99161f New changeset 084bec5443d6 by Serhiy Storchaka in branch '3.2': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/084bec5443d6 New changeset 086defaf16fe by Serhiy Storchaka in branch '3.3': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/086defaf16fe New changeset 218da678bb8b by Serhiy Storchaka in branch 'default': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/218da678bb8b
msg180897 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-29 09:48
Until subtests added an explicit call looks better to me. And when subtests will be added we will just add subtest inside the helper function.
History
Date User Action Args
2022-04-11 14:57:40 admin set github: 61183
2013-01-29 09:50:51 serhiy.storchaka set status: open -> closedresolution: fixedstage: patch review -> resolved
2013-01-29 09:48:12 serhiy.storchaka set messages: +
2013-01-29 08:53:08 python-dev set nosy: + python-devmessages: +
2013-01-29 00:33:37 ezio.melotti set messages: +
2013-01-28 14:20:52 serhiy.storchaka set messages: +
2013-01-25 22:58:19 serhiy.storchaka set files: + unicode_escape_decode_error_handling-2.7.patch, unicode_escape_decode_error_handling-3.2.patch, unicode_escape_decode_error_handling-3.3.patch, unicode_escape_decode_error_handling-3.4.patchmessages: +
2013-01-25 22:55:26 serhiy.storchaka set files: - unicode_escape_decode_error_handling-3.4.patch
2013-01-16 14:50:05 serhiy.storchaka set files: + unicode_escape_decode_error_handling-3.4.patchmessages: + dependencies: + SystemError in codecs.unicode_escape_decode()keywords: + patchstage: needs patch -> patch review
2013-01-16 10:46:45 serhiy.storchaka create