msg200117 - (view) |
Author: Guillaume Lebourgeois (glebourgeois) |
Date: 2013-10-17 09:55 |
After the fetch of a webpage with a wrongly declared encoding, the use of codecs module for a conversion crashes. The issue is reproducible this way : >>> content = b"+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml" >>> codecs.utf_7_decode(content, "replace", True) Traceback (most recent call last): File "", line 1, in SystemError: invalid maximum character passed to PyUnicode_New Original issue here : https://github.com/kennethreitz/requests/issues/1682 |
|
|
msg200132 - (view) |
Author: Matthew Barnett (mrabarnett) *  |
Date: 2013-10-17 14:54 |
The bytestring literal isn't valid. It starts with b" and later on has an unescaped " followed by more characters. Also, the usual way to decode by using the .decode method. I get this: >>> content = b"+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel=\"alternate\" type=\"application/rss+xml\"" >>> content.decode("utf-7", "strict") Traceback (most recent call last): File "<pyshell#10>", line 1, in content.decode("utf-7", "strict") File "C:\Python33\lib\encodings\utf_7.py", line 12, in decode return codecs.utf_7_decode(input, errors, True) UnicodeDecodeError: 'utf7' codec can't decode bytes in position 0-5: partial character in shift sequence |
|
|
msg200133 - (view) |
Author: Guillaume Lebourgeois (glebourgeois) |
Date: 2013-10-17 15:07 |
My fault, bad paste. Should have written : >>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml' >>> codecs.utf_7_decode(content, "replace", True) Traceback (most recent call last): File "", line 1, in SystemError: invalid maximum character passed to PyUnicode_New |
|
|
msg200134 - (view) |
Author: Guillaume Lebourgeois (glebourgeois) |
Date: 2013-10-17 15:13 |
"Also, the usual way to decode by using the .decode method." The original bug happened using requests library, so I have no leverage on the used method for decoding. But if you used the "replace" mode with your methodology, you would have raised the same Exception : >>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml' >>> content.decode("utf-7", "replace") File "", line 1, in File "/lib/python3.3/encodings/utf_7.py", line 12, in decode return codecs.utf_7_decode(input, errors, True) SystemError: invalid maximum character passed to PyUnicode_New |
|
|
msg200135 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2013-10-17 15:41 |
Indeed, 'utf-7' and the 'replace' error handler don't get along in this case. |
|
|
msg200136 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2013-10-17 15:41 |
That is, I can locally reproduce the behaviour Guillaume describes on the latest tip build. |
|
|
msg200144 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-17 16:29 |
Here is a patch for 3.3+. Other versions are affected too. They don't raise SystemError, but produce illegal unicode string on wide build. E.g. in Python 2.7: >>> 'a+/,+IKw-b'.decode('utf-7', 'replace') u'a\ufffd\U003f20acb' \U003f20ac is illegal code. As encoding and encoded data can come from external source, this can be used in secure attacks. |
|
|
msg200253 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-18 13:47 |
And here is a patch for 2.7. |
|
|
msg200263 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2013-10-18 14:33 |
2.6.9 doesn't produce a SystemError afaict: Python 2.6.9rc1+ (unknown, Oct 18 2013, 10:29:22) [GCC 4.4.3] on linux3 Type "help", "copyright", "credits" or "license" for more information. >>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml' >>> content.decode("utf-7", "replace") u'\ud7dd\ufffd rel=\'stylesheet\' type=\'text\ufffdcss\' \ufffd>\n<link rel="alternate" type="application\ufffdrss\uc669\ufffd' |
|
|
msg200264 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2013-10-18 14:36 |
On Oct 18, 2013, at 02:33 PM, Barry A. Warsaw wrote: >2.6.9 doesn't produce a SystemError afaict: Please note that 2.6.9 is security only, so the threshold for worrying about things is a remotely exploitable security vulnerability that cannot be reasonably worked around in Python code. |
|
|
msg200353 - (view) |
Author: Larry Hastings (larry) *  |
Date: 2013-10-19 01:24 |
Ping. Please fix before "beta 1". |
|
|
msg200450 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-10-19 17:39 |
New changeset 214c0aac7540 by Serhiy Storchaka in branch '2.7': Issue #19279: UTF-7 decoder no more produces illegal unicode strings. http://hg.python.org/cpython/rev/214c0aac7540 New changeset f471f2f05621 by Serhiy Storchaka in branch '3.3': Issue #19279: UTF-7 decoder no more produces illegal strings. http://hg.python.org/cpython/rev/f471f2f05621 New changeset 7dde9c553f16 by Serhiy Storchaka in branch 'default': Issue #19279: UTF-7 decoder no more produces illegal strings. http://hg.python.org/cpython/rev/7dde9c553f16 |
|
|
msg200465 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-10-19 18:17 |
New changeset 73ab6aba24e5 by Serhiy Storchaka in branch '3.3': Fixed tests for issue #19279. http://hg.python.org/cpython/rev/73ab6aba24e5 |
|
|
msg201508 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-10-27 23:26 |
@Serhiy: What is the status of the issue? |
|
|
msg201515 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-28 06:27 |
The bug is fixed on maintenance releases. Maintainer of 3.2 can backport the fix to 3.2 if it worth. |
|
|
msg207788 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-01-09 19:39 |
Georg, is this issue wort to be fixed in 3.2? If yes, use the patch against 2.7. |
|
|
msg215458 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2014-04-03 17:00 |
> Georg, is this issue wort to be fixed in 3.2? If yes, use the patch against 2.7. Ping? |
|
|
msg222203 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2014-07-03 17:51 |
To repeat the question do we or don't we fix this in 3.2? |
|
|
msg222223 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2014-07-03 21:41 |
I suggest to close the issue. It's "just" another way to crash Python 3.2, like any other bug fix. Python 3.2 does not accept bug fixes anymore. |
|
|