Issue 33529: [security] Infinite loop on folding email (_fold_as_ew()) if an header has no spaces (original) (raw)

Created on 2018-05-16 00:12 by rad164, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 7763 closed corona10,2018-11-07 01:33
PR 12020 merged python-dev,2019-02-24 17:49
PR 13321 merged miss-islington,2019-05-14 16:55
PR 14162 merged vstinner,2019-06-17 16:15
Messages (11)
msg316747 - (view) Author: Rad164 (rad164) Date: 2018-05-16 00:12
I just reported a bug about email folding at issue 33524, but this issue is more fatal in some languages like Chinese or Japanese, which does not insert spaces between each words. Python 3.6.5 has this issue, while 3.6.4 does not. Create an email with longer header than max_line_length set by its policy. And the header contains non-ascii characters but no white spaces. When try to fold it, python gets stuck and finally system hangs. There are no output unless I stop it with Ctrl-C. ^CTraceback (most recent call last): File "emailtest.py", line 7, in policy.fold("Subject", msg["Subject"]) File "/usr/lib/python3.6/email/policy.py", line 183, in fold return self._fold(name, value, refold_binary=True) File "/usr/lib/python3.6/email/policy.py", line 205, in _fold return value.fold(policy=self) File "/usr/lib/python3.6/email/headerregistry.py", line 258, in fold return header.fold(policy=policy) File "/usr/lib/python3.6/email/_header_value_parser.py", line 144, in fold return _refold_parse_tree(self, policy=policy) File "/usr/lib/python3.6/email/_header_value_parser.py", line 2651, in _refold_parse_tree part.ew_combine_allowed, charset) File "/usr/lib/python3.6/email/_header_value_parser.py", line 2735, in _fold_as_ew ew = _ew.encode(first_part) File "/usr/lib/python3.6/email/_encoded_words.py", line 215, in encode blen = _cte_encode_length['b'](bstring) File "/usr/lib/python3.6/email/_encoded_words.py", line 130, in len_b groups_of_3, leftover = divmod(len(bstring), 3) KeyboardInterrupt Code to reproduce: from email.message import EmailMessage from email.policy import default policy = default # max_line_length = 78 msg = EmailMessage() msg["Subject"] = "á"*100 policy.fold("Subject", msg["Subject"]) No problems in following cases: 1. If the header is shorter than max_line_length. 2. If the header can be split with spaces and the all chunk is shorter than max_line_length. 3. If the header is fully composed with ascii characters. In this case, there is no problem even if it is very long without spaces.
msg319807 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-06-17 07:47
I tried the test case on master branch. I ran the test case on 1GB RAM Linux based digitalocean droplet to have the script killed. Please find the results as below : # Python build ➜ cpython git:(master) ✗ ./python Python 3.8.0a0 (heads/bpo33095-add-reference:9d49f85, Jun 17 2018, 07:22:33) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> # Test case ➜ cpython git:(master) ✗ cat foo.py from email.message import EmailMessage from email.policy import default policy = default # max_line_length = 78 msg = EmailMessage() msg["Subject"] = "á"*100 policy.fold("Subject", msg["Subject"]) # Test case execution ➜ cpython git:(master) ✗ time ./python foo.py [2] 13637 killed ./python foo.py ./python foo.py 387.36s user 3.85s system 90% cpu 7:11.94 total # I tried to do Ctrl + C after 2 minutes to stop and the stack trace is as below : ➜ cpython git:(master) ✗ time ./python foo.py ^CTraceback (most recent call last): File "foo.py", line 7, in policy.fold("Subject", msg["Subject"]) File "/root/cpython/Lib/email/policy.py", line 183, in fold return self._fold(name, value, refold_binary=True) File "/root/cpython/Lib/email/policy.py", line 205, in _fold return value.fold(policy=self) File "/root/cpython/Lib/email/headerregistry.py", line 258, in fold return header.fold(policy=policy) File "/root/cpython/Lib/email/_header_value_parser.py", line 144, in fold return _refold_parse_tree(self, policy=policy) File "/root/cpython/Lib/email/_header_value_parser.py", line 2650, in _refold_parse_tree part.ew_combine_allowed, charset) File "/root/cpython/Lib/email/_header_value_parser.py", line 2728, in _fold_as_ew ew = _ew.encode(first_part, charset=encode_as) File "/root/cpython/Lib/email/_encoded_words.py", line 226, in encode qlen = _cte_encode_length['q'](bstring) File "/root/cpython/Lib/email/_encoded_words.py", line 93, in len_q return sum(len(_q_byte_map[x]) for x in bstring) File "/root/cpython/Lib/email/_encoded_words.py", line 93, in return sum(len(_q_byte_map[x]) for x in bstring) KeyboardInterrupt ./python foo.py 131.41s user 0.43s system 98% cpu 2:13.89 total Thanks
msg330925 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-12-03 10:17
Since it's a denial of service which can be triggered by an user, I mark this issue as a security issue. I can be wrong, but it seems like Python 2.7 isn't affected: Lib/email/_header_value_parser.py was added by bpo-12586 (commit 0b6f6c82b51b7071d88f48abb3192bf3dc2a2d24). Python 2.7 doesn't have this file nor policies.
msg342487 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-14 16:55
New changeset c1f5667be1e3ec5871560c677402c1252c6018a6 by Victor Stinner (Krzysztof Wojcik) in branch 'master': bpo-33529, email: Fix infinite loop in email header encoding (GH-12020) https://github.com/python/cpython/commit/c1f5667be1e3ec5871560c677402c1252c6018a6
msg342512 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-14 20:12
New changeset 2fef5b01e36a17e36fd7e65c4b51f5ede8880dda by Victor Stinner (Miss Islington (bot)) in branch '3.7': bpo-33529, email: Fix infinite loop in email header encoding (GH-12020) (GH-13321) https://github.com/python/cpython/commit/2fef5b01e36a17e36fd7e65c4b51f5ede8880dda
msg344561 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 12:50
Python 3.6, 3.5 and 2.7 are still vulnerable. Is there someone interested to backport the fix?
msg345874 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-17 16:22
It's unclear to me if Python 3.5 is affected or not. The fix changes the function _fold_as_ew(), Python 3.5 doesn't have this function *but* there is a call a _fold_as_ew() method!? Lib/email/_header_value_parser.py:427: in _fold() method ... if is_ew or last_ew: # It's too big to fit on the line, but since we've # got encoded words we can use encoded word folding. part._fold_as_ew(folded) continue ... If I backport the 2 tests, they fail *but* they don't hang forever (they complete in less than 1 second). ====================================================================== FAIL: test_fold_overlong_words_using_RFC2047 (test.test_email.test_headerregistry.TestFolding) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/vstinner/prog/python/3.5/Lib/test/test_email/test_headerregistry.py", line 1601, in test_fold_overlong_words_using_RFC2047 'X-Report-Abuse: =?utf-8?q?=3Chttps=3A//www=2Emailitapp=2E' AssertionError: 'X-Report-Abuse: <https://www.mailitapp.com/report_abuse.p[50 chars]x>\n' != 'X-Report-Abuse: =?utf-8?q?=3Chttps=3A//www=2Emailitapp=2E[114 chars]?=\n' - X-Report-Abuse: <https://www.mailitapp.com/report_abuse.php?mid=xxx-xxx-xxxxxxxxxxxxxxxxxxxxxxxx==-xxx-xx-xx> + X-Report-Abuse: =?utf-8?q?=3Chttps=3A//www=2Emailitapp=2Ecom/report=5Fabuse?= + =?utf-8?q?=2Ephp=3Fmid=3Dxxx-xxx-xxxxxxxxxxxxxxxxxxxxxxxx=3D=3D-xxx-xx-xx?= + =?utf-8?q?=3E?= ====================================================================== FAIL: test_non_ascii_chars_do_not_cause_inf_loop (test.test_email.test_policy.PolicyAPITests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/vstinner/prog/python/3.5/Lib/test/test_email/test_policy.py", line 241, in test_non_ascii_chars_do_not_cause_inf_loop 12 * ' =?utf-8?q?=C4=85?=\n') AssertionError: 'Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?=\n' != 'Subject: \n =?utf-8?q?=C4=85?=\n =?utf-8?q?=C4=85?[209 chars]?=\n' - Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?= + Subject: + =?utf-8?q?=C4=85?= + =?utf-8?q?=C4=85?= + =?utf-8?q?=C4=85?= + =?utf-8?q?=C4=85?= + =?utf-8?q?=C4=85?= + =?utf-8?q?=C4=85?= + =?utf-8?q?=C4=85?= + =?utf-8?q?=C4=85?= + =?utf-8?q?=C4=85?= + =?utf-8?q?=C4=85?= + =?utf-8?q?=C4=85?= + =?utf-8?q?=C4=85?=
msg345878 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-17 16:46
Python 3.5 is not vulnerable, it doesn't hang on the following code: import email.policy policy = email.policy.default.clone(max_line_length=20) actual = policy.fold('Subject', '\u0105' * 12)
msg345879 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-17 16:50
Python 2.7 doesn't have email.policy module. For Python 2.7, I wrote this code: --- import email.header import email.message msg = email.message.Message() msg.set_charset("UTF-8") msg['Subject'] = email.header.Header(u'\u0105' * 12, maxlinelen=20, charset="UTF-8") print(msg.as_string()) --- I get this output: --- MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Subject: =?utf-8?b?xIU=?= =?utf-8?b?xIU=?= =?utf-8?b?xIU=?= =?utf-8?b?xIU=?= =?utf-8?b?xIU=?= =?utf-8?b?xIU=?= =?utf-8?b?xIU=?= =?utf-8?b?xIU=?= =?utf-8?b?xIU=?= =?utf-8?b?xIU=?= =?utf-8?b?xIU=?= =?utf-8?b?xIU=?= --- I have no idea if this example says that Python 2.7 is vulnerable or not. I get a different output on the master branch: --- MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?= --- But I don't know if I use the email API properly. "Subject: =?utf-8?b?xIXEhcSFxIXEhcSFxIXEhcSFxIXEhcSF?=" is longer than 20 characters.
msg345938 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-06-18 00:14
New changeset 516a6a254814d2bc6a90290dfc44d77fdfb4050b by Ned Deily (Victor Stinner) in branch '3.6': bpo-33529, email: Fix infinite loop in email header encoding (GH-12020) (GH-14162) https://github.com/python/cpython/commit/516a6a254814d2bc6a90290dfc44d77fdfb4050b
msg345960 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-18 08:31
Using git bisect, I found which commit introduced the regression, bpo-27240: commit a87ba60fe56ae2ebe80ab9ada6d280a6a1f3d552 Author: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Date: Sun Dec 3 16:46:23 2017 -0800 bpo-27240 Rewrite the email header folding algorithm. (GH-3488) (#4693) The original algorithm tried to delegate the folding to the tokens so that those tokens whose folding rules differed could specify the differences. However, this resulted in a lot of duplicated code because most of the rules were the same. The new algorithm moves all folding logic into a set of functions external to the token classes, but puts the information about which tokens can be folded in which ways on the tokens...with the exception of mime-parameters, which are a special case (which was not even implemented in the old folder). This algorithm can still probably be improved and hopefully simplified somewhat. Note that some of the test expectations are changed. I believe the changes are toward more desirable and consistent behavior: in general when (re) folding a line the canonical version of the tokens is generated, rather than preserving errors or extra whitespace. (cherry picked from commit 85d5c18c9d83a1d54eecc4c2ad4dce63194107c6) The first vulnerable release is Python 3.6.4: Python 3.6.3 and older are not affected by this vulnerability. So yes, I confirm that Python 2.7 and 3.5 are not vulnerable. By the way, a backport to 3.5 was requested but rejected :-) https://bugs.python.org/issue27240#msg330030 I close the issue. Thanks Rad164 for the report and thanks Krzysztof Wojcik fo the fix!
History
Date User Action Args
2022-04-11 14:59:00 admin set github: 77710
2019-06-18 08:31:25 vstinner set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2019-06-18 00:14:04 ned.deily set nosy: + ned.deilymessages: +
2019-06-17 16:56:10 xtreak set nosy: + maxking
2019-06-17 16:50:46 vstinner set messages: +
2019-06-17 16:46:31 vstinner set messages: +
2019-06-17 16:22:02 vstinner set messages: +
2019-06-17 16:15:07 vstinner set pull_requests: + <pull%5Frequest14004>
2019-06-05 16:17:07 cheryl.sabella link issue34222 superseder
2019-06-04 12:50:39 vstinner set messages: +
2019-05-14 20:12:49 vstinner set messages: +
2019-05-14 16:55:43 miss-islington set pull_requests: + <pull%5Frequest13232>
2019-05-14 16:55:27 vstinner set messages: +
2019-02-24 17:49:12 python-dev set pull_requests: + <pull%5Frequest12052>
2018-12-03 10:17:52 vstinner set nosy: + vstinnertitle: Infinite loop on folding email if headers has no spaces -> [security] Infinite loop on folding email (_fold_as_ew()) if an header has no spacesmessages: + versions: + Python 3.7, Python 3.8type: behavior -> security
2018-11-07 01:33:13 corona10 set pull_requests: + <pull%5Frequest9673>
2018-06-17 17:10:26 corona10 set pull_requests: - <pull%5Frequest7371>
2018-06-17 11:35:30 corona10 set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest7371>
2018-06-17 07:47:24 xtreak set nosy: + xtreakmessages: +
2018-05-16 00:12:28 rad164 create