msg180077 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-16 10:46 |
An error handler in unicode_escape_decode() eats at least one byte (or more) after illegal escape sequence. >>> import codecs >>> codecs.unicode_escape_decode(br'\u!@#', 'replace') ('�', 5) >>> codecs.unicode_escape_decode(br'\u!@#$', 'replace') ('�@#$', 6) raw_unicode_escape_decode() works right: >>> codecs.raw_unicode_escape_decode(br'\u!@#', 'replace') ('�!@#', 5) >>> codecs.raw_unicode_escape_decode(br'\u!@#$', 'replace') ('�!@#$', 6) See also . |
|
|
msg180091 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-16 14:50 |
Here is a patch for 3.4. Patches for other versions will be different a lot. |
|
|
msg180634 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-25 22:58 |
Here is a set of patches for all versions (patch for 3.4 updated). |
|
|
msg180857 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-28 14:20 |
Ezio, is it a good factorization? def check(self, coder): def checker(input, expect): self.assertEqual(coder(input), (expect, len(input))) return checker def test_escape_decode(self): decode = codecs.unicode_escape_decode check = self.check(decode) check(b"[\\\n]", "[]") check(br'[\"]', '["]') check(br"[\']", "[']") # other 20 checks ... And same for test_escape_encode and for bytes escape decoder. |
|
|
msg180890 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2013-01-29 00:33 |
LGTM. If you want to push it even further you could make a list of (input, expected) and call the check() in a loop. That way it will also be easier to refactor if/when we add subtests (#16997). |
|
|
msg180896 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-01-29 08:53 |
New changeset a242ac99161f by Serhiy Storchaka in branch '2.7': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/a242ac99161f New changeset 084bec5443d6 by Serhiy Storchaka in branch '3.2': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/084bec5443d6 New changeset 086defaf16fe by Serhiy Storchaka in branch '3.3': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/086defaf16fe New changeset 218da678bb8b by Serhiy Storchaka in branch 'default': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/218da678bb8b |
|
|
msg180897 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-29 09:48 |
Until subtests added an explicit call looks better to me. And when subtests will be added we will just add subtest inside the helper function. |
|
|