Issue 34222: Email message serialization enters an infinite loop when folding non-ASCII headers with long words (original) (raw)

(Discovered together with https://bugs.python.org/msg322348)

Email message serialization (in function _fold_as_ew) enters an infinite loop when folding non-ASCII headers whose words (after encoding) are longer than the given maxlen.

Besides being stuck in an infinite loop, it keeps appending to the lines list, so its memory usage keeps on growing also infinitely. The code keeps appending encoded empty strings to the list like this:

lines: [ 'Subject: =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' ' ] (and it keeps on growing)

Here is my code that can reproduce this issue (as a unittest):

import email.generator import email.policy from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText from unittest import TestCase

def create_message(subject, sender, recipients, body): msg = MIMEMultipart() msg.set_charset('utf-8') msg.policy = email.policy.SMTP msg.attach(MIMEText(body, 'html')) msg['Subject'] = subject msg['From'] = sender msg['To'] = ';'.join(recipients) return msg

class TestEmailMessage(TestCase): def _make_message(self, subject): return create_message( subject=subject, sender='me@site.com', recipients=['me@site.com'], body='Some text', )

def test_ascii_message_with_len_limit(self):
    # very long subject consisting of a single word
    subject = 'Q' * 100
    msg = self._make_message(subject)
    self.assertTrue(msg.as_string(maxheaderlen=76))

def test_non_ascii_message_with_len_limit(self):
    # very long subject consisting of a single word
    subject = 'Ц' * 100
    msg = self._make_message(subject)
    self.assertTrue(msg.as_string(maxheaderlen=76))

The ASCII test passes, but the non-ASCII one never finishes.

From what I can tell, the problem is in line 2728 of email/_header_value_parser.py:

        first_part = first_part[:-excess]

where excess is calculated from the encoded string (which is several times longer than the original one), but it truncates the original (non-encoded string). The problem arises when excess is actually greater than first_part So, it attempts to encode the exact same part of the header and fails in every iteration, instead appending an empty string to the list and encoding it as ' =?utf-8?q??='

What this amounts to is that it's now practically impossible to send emails with non-ACSII subjects without either disregarding the RFC recommendations and requirements for line length or risking hangs and memory leaks.

Just like in https://bugs.python.org/msg322348, this behavior is new in Python 3.6. Also does not work in 3.7 and 3.8

Hello Grigory. I using our patch in my project. I have some problems with your fixes.

Source text: Subject: test Венесуэла собирается пересмотреть стоимость заключенных с Россией контрактов на поставку вооружений, а также отношения с Москвой в целом. Об этом заявил назначенный оппозицией специальный представитель Венесуэлы при Организации американских государств (ОАГ) Густаво Тарре Брисеньо на выступлении в вашингтонском Центре стратегических и международных исследований, передает

Encoded text using thunderbird: Subject: =?UTF-8?B?dGVzdCDQktC10L3QtdGB0YPRjdC70LAg0YHQvtCx0LjRgNCw0LXRgtGB?= =?UTF-8?B?0Y8g0L/QtdGA0LXRgdC80L7RgtGA0LXRgtGMINGB0YLQvtC40LzQvtGB0YLRjCA=?= =?UTF-8?B?0LfQsNC60LvRjtGH0LXQvdC90YvRhSDRgSDQoNC+0YHRgdC40LXQuSDQutC+0L0=?= =?UTF-8?B?0YLRgNCw0LrRgtC+0LIg0L3QsCDQv9C+0YHRgtCw0LLQutGDINCy0L7QvtGA0YM=?= =?UTF-8?B?0LbQtdC90LjQuSwg0LAg0YLQsNC60LbQtSDQvtGC0L3QvtGI0LXQvdC40Y8g0YEg?= =?UTF-8?B?0JzQvtGB0LrQstC+0Lkg0LIg0YbQtdC70L7QvC4g0J7QsSDRjdGC0L7QvCDQt9Cw?= =?UTF-8?B?0Y/QstC40Lsg0L3QsNC30L3QsNGH0LXQvdC90YvQuSDQvtC/0L/QvtC30LjRhtC4?= =?UTF-8?B?0LXQuSDRgdC/0LXRhtC40LDQu9GM0L3Ri9C5INC/0YDQtdC00YHRgtCw0LLQuNGC?= =?UTF-8?B?0LXQu9GMINCS0LXQvdC10YHRg9GN0LvRiyDQv9GA0Lgg0J7RgNCz0LDQvdC40Lc=?= =?UTF-8?B?0LDRhtC40Lgg0LDQvNC10YDQuNC60LDQvdGB0LrQuNGFINCz0L7RgdGD0LTQsNGA?= =?UTF-8?B?0YHRgtCyICjQntCQ0JMpINCT0YPRgdGC0LDQstC+INCi0LDRgNGA0LUg0JHRgNC4?= =?UTF-8?B?0YHQtdC90YzQviDQvdCwINCy0YvRgdGC0YPQv9C70LXQvdC40Lgg0LIg0LLQsNGI?= =?UTF-8?B?0LjQvdCz0YLQvtC90YHQutC+0Lwg0KbQtdC90YLRgNC1INGB0YLRgNCw0YLQtdCz?= =?UTF-8?B?0LjRh9C10YHQutC40YUg0Lgg0LzQtdC20LTRg9C90LDRgNC+0LTQvdGL0YUg0Lg=?= =?UTF-8?B?0YHRgdC70LXQtNC+0LLQsNC90LjQuSwg0L/QtdGA0LXQtNCw0LXRgg==?=

Text after decode and encode in python with our patch: Subject: test =?utf-8?b?0JLQtdC90LXRgdGD0Y3Qu9CwINGB0L7QsdC40YDQsNC10YLRgdGP?= =?utf-8?b?0L/=?utf-8?q?QtdGA0LXRgdC80L7RgtGA0LXRgtGM=3F=3D_=D1=81=D1=82?= =?utf-8?b?0L7QuNC80L7RgdGC0Ywg0LfQsNC60LvRjtGH0LXQvdC90YvRhSDRgSDQoNC+0YE=?= =?utf-8?b?0YHQuNC10Lkg0LrQvtC90YLRgNCw0LrRgtC+0LIg0L3QsCDQv9C+0YHRgtCw0LI=?= =?utf-8?b?0LrRgyDQstC+0L7RgNGD0LbQtdC90LjQuSwg0LAg0YLQsNC60LbQtSDQvtGC0L0=?= =?utf-8?b?0L7RiNC10L3QuNGPINGBINCc0L7RgdC60LLQvtC5INCyINGG0LXQu9C+0LwuINCe?= =?utf-8?b?0LEg0Y3RgtC+0Lwg0LfQsNGP0LLQuNC7INC90LDQt9C90LDRh9C10L3QvdGL0Lkg?= =?utf-8?b?0L7Qv9C/0L7Qt9C40YbQuNC10Lkg0YHQv9C10YbQuNCw0LvRjNC90YvQuSDQv9GA?= =?utf-8?b?0LXQtNGB0YLQsNCy0LjRgtC10LvRjCDQktC10L3QtdGB0YPRjdC70Ysg0L/RgNC4?= =?utf-8?b?0J7RgNCz0LDQvdC40LfQsNGG0LjQuCDQsNC80LXRgNC40LrQsNC90YHQutC40YU=?= =?utf-8?b?0LPQvtGB0YPQtNCw0YDRgdGC0LIgKNCe0JDQkykg0JPRg9GB0YLQsNCy0L4g0KI=?= =?utf-8?b?0LDRgNGA0LUg0JHRgNC40YHQtdC90YzQviDQvdCwINCy0YvRgdGC0YPQv9C70LU=?= =?utf-8?b?0L3QuNC4INCyINCy0LDRiNC40L3Qs9GC0L7QvdGB0LrQvtC8INCm0LXQvdGC0YA=?= =?utf-8?b?0LUg0YHRgtGA0LDRgtC10LPQuNGH0LXRgdC60LjRhSDQuCDQvNC10LbQtNGD0L0=?= =?utf-8?b?0LDRgNC+0LTQvdGL0YUg0LjRgdGB0LvQtdC00L7QstCw0L3QuNC5LCDQv9C10YA=?= =?utf-8?b?0LXQtNCw0LXRgg==?=

Result text: Subject: test Венесуэла собирается =?utf-8?b?0L/QtdGA0LXRgdC80L7RgtGA0LXRgtGM?= стоимость заключенных с Россией контрактов на поставку вооружений, а также отношения с Москвой в целом. Об этом заявил назначенный оппозицией специальный представитель Венесуэлы приОрганизации американскихгосударств (ОАГ) Густаво Тарре Брисеньо на выступлении в вашингтонском Центре стратегических и международных исследований, передает

If need, i can write simple code for reproduce bug.