msg335975 - (view) |
Author: Aaryn Tonita (aaryn.startmail) * |
Date: 2019-02-19 16:45 |
When using a policy for an EmailMessage that triggers folding (during serialization) of a fairly long display_name in an address field, the folding process removes the quotes from the display name breaking the semantics of the field. In particular, for a From address's display name like r'anything@anything.com ' + 'a' * MAX_LINE_LEN the folding puts anything@anything.com unquoted immediately after the From: header. For applications that do sender verification inside and then send it to an internal SMTP server that does not perform its own sender verification this could be considered a security issue since it enables sender spoofing. Receiving mail servers might be able to detect the broken header, but experiments show that the mail gets delivered. Simple demonstration (reproduced in attachment) of issue: SMTP_POLICY = email.policy.default.clone(linesep='\r\n', max_line_length=72) address = Address(display_name=r'anything@anything.com ' + 'a' * 72, addr_spec='dev@local.startmail.org') message = EmailMessage(policy=SMTP_POLICY) message['From'] = Address(display_name=display_name, addr_spec=addr_spec) # Trigger folding (via as_string()), then parse it back in. msg_string = message.as_string() msg_bytes = msg_string.encode('utf-8') msg_deserialized = BytesParser(policy=SMTP_POLICY).parsebytes(msg_bytes) # Verify badness from_hdr = msg_deserialized['From'] assert from_hdr != str(address) # But they should be equal... |
|
|
msg336010 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2019-02-19 19:55 |
Since Address itself renders it correctly (str(address)), the problem is going to take a bit of digging to find. I'm guessing the quoted_string atom is getting transformed incorrectly into something else at some point during the folding. |
|
|
msg336049 - (view) |
Author: Aaryn Tonita (aaryn.startmail) * |
Date: 2019-02-20 08:32 |
Hi David, the problem is in email._header_value_parser._refold_parse_tree. Specifically, when the parsetree renders too long, it recursively gets newparts = list(part) (the children). When it does this to a BareQuotedString node, the child nodes are unquoted and unescaped and it just happily serializes these. I thought I had attached a file that monkey patches the _refold_parse_tree function with a fixed version... let me try again. |
|
|
msg336093 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2019-02-20 14:40 |
I'm afraid I don't have time to parse through the file you uploaded. Can you produce a pull request or a diff showing your fix? And ideally some added tests :) But whatever you can do is great, if you don't have time maybe someone else will pick it up (I unfortunately don't have time, though I should be able to do a review of a PR). |
|
|
msg336109 - (view) |
Author: Aaryn Tonita (aaryn.startmail) * |
Date: 2019-02-20 16:07 |
Sure thing, I'll try to produce something tomorrow. |
|
|
msg336667 - (view) |
Author: Aaryn Tonita (aaryn.startmail) * |
Date: 2019-02-26 13:34 |
Sorry about the delay. I opened pull request https://github.com/python/cpython/pull/12054 for this. Let me know if you need anything else. |
|
|
msg336668 - (view) |
Author: Aaryn Tonita (aaryn.startmail) * |
Date: 2019-02-26 13:36 |
Although I am not personally interested in backporting a fix for this issue, anyone that experiences this issue in python 3.5 can execute the following monkey patch to solve the issue: def _fix_issue_36041_3_5(): from email._header_value_parser import QuotedString, ValueTerminal, quote_string import email._header_value_parser class BareQuotedString(QuotedString): token_type = 'bare-quoted-string' def __str__(self): return quote_string(''.join(str(x) for x in self)) @property def value(self): return ''.join(str(x) for x in self) @property def parts(self): parts = list(self) escaped_parts = [] for part in parts: if isinstance(part, ValueTerminal): escaped = quote_string(str(part))[1:-1] escaped_parts.append(ValueTerminal(escaped, 'ptext')) else: escaped_parts.append(part) # Add quotes to the first and last parts. escaped_parts[0] = ValueTerminal('"' + str(escaped_parts[0]), 'ptext') escaped_parts[-1] = ValueTerminal(str(escaped_parts[-1] + '"'), 'ptext') return escaped_parts email._header_value_parser.BareQuotedString = BareQuotedString |
|
|