Issue 19279: UTF-7 decoder can produce inconsistent Unicode string (original) (raw)

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, barry, benjamin.peterson, ezio.melotti, georg.brandl, glebourgeois, larry, mcepl, mrabarnett, ncoghlan, piotr.dobrogost, python-dev, serhiy.storchaka, vstinner
Priority: release blocker Keywords: patch

Created on 2013-10-17 09:55 by glebourgeois, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
utf7_errors.patch serhiy.storchaka,2013-10-17 16:29 review
utf7_errors-2.7.patch serhiy.storchaka,2013-10-18 13:47 review
Messages (19)
msg200117 - (view) Author: Guillaume Lebourgeois (glebourgeois) Date: 2013-10-17 09:55
After the fetch of a webpage with a wrongly declared encoding, the use of codecs module for a conversion crashes. The issue is reproducible this way : >>> content = b"+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml" >>> codecs.utf_7_decode(content, "replace", True) Traceback (most recent call last): File "", line 1, in SystemError: invalid maximum character passed to PyUnicode_New Original issue here : https://github.com/kennethreitz/requests/issues/1682
msg200132 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-10-17 14:54
The bytestring literal isn't valid. It starts with b" and later on has an unescaped " followed by more characters. Also, the usual way to decode by using the .decode method. I get this: >>> content = b"+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel=\"alternate\" type=\"application/rss+xml\"" >>> content.decode("utf-7", "strict") Traceback (most recent call last): File "<pyshell#10>", line 1, in content.decode("utf-7", "strict") File "C:\Python33\lib\encodings\utf_7.py", line 12, in decode return codecs.utf_7_decode(input, errors, True) UnicodeDecodeError: 'utf7' codec can't decode bytes in position 0-5: partial character in shift sequence
msg200133 - (view) Author: Guillaume Lebourgeois (glebourgeois) Date: 2013-10-17 15:07
My fault, bad paste. Should have written : >>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml' >>> codecs.utf_7_decode(content, "replace", True) Traceback (most recent call last): File "", line 1, in SystemError: invalid maximum character passed to PyUnicode_New
msg200134 - (view) Author: Guillaume Lebourgeois (glebourgeois) Date: 2013-10-17 15:13
"Also, the usual way to decode by using the .decode method." The original bug happened using requests library, so I have no leverage on the used method for decoding. But if you used the "replace" mode with your methodology, you would have raised the same Exception : >>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml' >>> content.decode("utf-7", "replace") File "", line 1, in File "/lib/python3.3/encodings/utf_7.py", line 12, in decode return codecs.utf_7_decode(input, errors, True) SystemError: invalid maximum character passed to PyUnicode_New
msg200135 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2013-10-17 15:41
Indeed, 'utf-7' and the 'replace' error handler don't get along in this case.
msg200136 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2013-10-17 15:41
That is, I can locally reproduce the behaviour Guillaume describes on the latest tip build.
msg200144 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-17 16:29
Here is a patch for 3.3+. Other versions are affected too. They don't raise SystemError, but produce illegal unicode string on wide build. E.g. in Python 2.7: >>> 'a+/,+IKw-b'.decode('utf-7', 'replace') u'a\ufffd\U003f20acb' \U003f20ac is illegal code. As encoding and encoded data can come from external source, this can be used in secure attacks.
msg200253 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-18 13:47
And here is a patch for 2.7.
msg200263 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2013-10-18 14:33
2.6.9 doesn't produce a SystemError afaict: Python 2.6.9rc1+ (unknown, Oct 18 2013, 10:29:22) [GCC 4.4.3] on linux3 Type "help", "copyright", "credits" or "license" for more information. >>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml' >>> content.decode("utf-7", "replace") u'\ud7dd\ufffd rel=\'stylesheet\' type=\'text\ufffdcss\' \ufffd>\n<link rel="alternate" type="application\ufffdrss\uc669\ufffd'
msg200264 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2013-10-18 14:36
On Oct 18, 2013, at 02:33 PM, Barry A. Warsaw wrote: >2.6.9 doesn't produce a SystemError afaict: Please note that 2.6.9 is security only, so the threshold for worrying about things is a remotely exploitable security vulnerability that cannot be reasonably worked around in Python code.
msg200353 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013-10-19 01:24
Ping. Please fix before "beta 1".
msg200450 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-10-19 17:39
New changeset 214c0aac7540 by Serhiy Storchaka in branch '2.7': Issue #19279: UTF-7 decoder no more produces illegal unicode strings. http://hg.python.org/cpython/rev/214c0aac7540 New changeset f471f2f05621 by Serhiy Storchaka in branch '3.3': Issue #19279: UTF-7 decoder no more produces illegal strings. http://hg.python.org/cpython/rev/f471f2f05621 New changeset 7dde9c553f16 by Serhiy Storchaka in branch 'default': Issue #19279: UTF-7 decoder no more produces illegal strings. http://hg.python.org/cpython/rev/7dde9c553f16
msg200465 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-10-19 18:17
New changeset 73ab6aba24e5 by Serhiy Storchaka in branch '3.3': Fixed tests for issue #19279. http://hg.python.org/cpython/rev/73ab6aba24e5
msg201508 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-10-27 23:26
@Serhiy: What is the status of the issue?
msg201515 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-28 06:27
The bug is fixed on maintenance releases. Maintainer of 3.2 can backport the fix to 3.2 if it worth.
msg207788 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-01-09 19:39
Georg, is this issue wort to be fixed in 3.2? If yes, use the patch against 2.7.
msg215458 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-03 17:00
> Georg, is this issue wort to be fixed in 3.2? If yes, use the patch against 2.7. Ping?
msg222203 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-03 17:51
To repeat the question do we or don't we fix this in 3.2?
msg222223 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-03 21:41
I suggest to close the issue. It's "just" another way to crash Python 3.2, like any other bug fix. Python 3.2 does not accept bug fixes anymore.
History
Date User Action Args
2022-04-11 14:57:52 admin set github: 63478
2014-07-04 18:39:11 serhiy.storchaka set title: UTF-7 can produce inconsistent Unicode string -> UTF-7 decoder can produce inconsistent Unicode string
2014-07-04 18:38:35 serhiy.storchaka set status: open -> closedtitle: UTF-7 to UTF-8 decoding crash -> UTF-7 can produce inconsistent Unicode stringstage: patch review -> resolvedresolution: fixedversions: + Python 2.7, Python 3.3, Python 3.4, - Python 3.2
2014-07-03 21:41:45 vstinner set messages: +
2014-07-03 17:51:26 BreamoreBoy set nosy: + BreamoreBoymessages: +
2014-04-03 17:00:34 vstinner set messages: +
2014-01-09 19:39:08 serhiy.storchaka set messages: +
2013-11-22 07:09:49 mcepl set nosy: + mcepl
2013-10-28 06:27:12 serhiy.storchaka set messages: +
2013-10-27 23:26:38 vstinner set messages: +
2013-10-22 17:31:20 serhiy.storchaka set assignee: serhiy.storchaka -> versions: - Python 2.7, Python 3.3, Python 3.4
2013-10-19 18:17:20 python-dev set messages: +
2013-10-19 17:39:55 python-dev set nosy: + python-devmessages: +
2013-10-19 01:24:47 larry set messages: +
2013-10-18 14:40:57 barry set versions: - Python 2.6
2013-10-18 14:36:25 barry set messages: +
2013-10-18 14:33:18 barry set messages: +
2013-10-18 13:47:06 serhiy.storchaka set files: + utf7_errors-2.7.patchmessages: +
2013-10-18 10:32:53 piotr.dobrogost set nosy: + piotr.dobrogost
2013-10-17 16:29:57 serhiy.storchaka set files: + utf7_errors.patchpriority: normal -> release blockertype: crash -> securityversions: + Python 2.6, Python 2.7, Python 3.2keywords: + patchnosy: + larry, benjamin.peterson, barry, georg.brandlmessages: + stage: needs patch -> patch review
2013-10-17 15:41:54 ncoghlan set messages: +
2013-10-17 15:41:05 ncoghlan set nosy: + ncoghlanmessages: +
2013-10-17 15:13:00 glebourgeois set messages: +
2013-10-17 15:07:30 glebourgeois set messages: +
2013-10-17 14:54:02 mrabarnett set nosy: + mrabarnettmessages: +
2013-10-17 10:02:11 vstinner set nosy: + vstinner
2013-10-17 09:57:27 serhiy.storchaka set versions: + Python 3.4nosy: + ezio.melotti, serhiy.storchakaassignee: serhiy.storchakacomponents: + Unicodestage: needs patch
2013-10-17 09:55:36 glebourgeois create