msg22785 - (view) |
Author: Scott Dossey (sdossey) |
Date: 2004-10-19 19:51 |
THe following email address is legal according to RFC: <"\"quoted string\" somebody"@somedomain.com"> I've got a python mail handling back end server that handles mail coming in from Postfix. Postfix properly accepts mail of this type, but when it goes to relay this through my Python server it fails. The problem is inside smtplib.py inside "quoteaddr". Here's a source code snippet: def quoteaddr(addr) """Quote a subset of the email addresses defined by RFC 821. Should be able to handle anything rfc822.parseaddr can handle. """ m = (None, None) try: m=rfc822.parseaddr(addr)[1] except AttributeError: pass if m == (None, None): # Indicates parse failure or AttributeError #something weird here.. punt -ddm return "<%s>" % addr Basically quoteaddr ends up wrapping whatever parseaddr returns to it in brackets and sending that out on the wire for the RCPT TO command. however, if I call rfc822.parseaddr it does bizarre things to email addresses. For instance the following calls all yield the same thing (some not surprisingly): rfc822.parseaddr('""test" test"@test.com') rfc822.parseaddr('"\"test\" test"@test.com') rfc822.parseaddr('"\\"test\\" test"@test.com') rfc822.parseaddr('"\\\"test\\\" test"@test/com') the above all yield: ('', '""test" test"@test.com') rfc822.parseaddr('"\\\\"test\\\\" test"@test/com') yields the VERY surprising result: ('', '"\\"test\\\\" test"@test.com') I submitted this as a new bug report even though there are two similar bugs regarding parseAddr because it is a slightly separate issue. -Scott Dossey <seveirein /at/ yahoo.com> |
|
|
msg117778 - (view) |
Author: Jeffrey Finkelstein (jfinkels) * |
Date: 2010-10-01 05:41 |
I can confirm this bug. Attached is the test case. |
|
|
msg117818 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-10-01 17:31 |
It does appear as though parseaddr is dropping quoting information from the returned parsed address. Fixing this is likely to create backward compatibility issues, and I'm very curious to know why parseaddr drops the quoting info. Note that I do not observe the change from test\com to test.com, so I'm assuming that was a typo and ignoring that part (or it represents a bug that is already fixed). The "weird" example is actually consistent with the rest of parseaddr's behavior, if you understand that behavior as turning quoted pairs inside quoted strings into their literal value, but leaving the quotes around the quoted string(s) in place. Consider the example: parseaddr('"\\\\"test\\\\" test"@test.com') If we remove the Python quoting from this input string we have: "\\"test\\" test"@test.com Interpreting this according to RFC rules we have a quoted string "\\" containing a quoted pair (\\). The quoted pair resolves to a single \. Then we have the unquoted text test\\ This parseaddr copies literally (I'm not sure if that is strictly RFC compliant, but given that we are supposed to be liberal in what we except it is as reasonable a thing to do as any.) Finally we have another quoted string " test" So putting those pieces together according to the rules above, we end up with: "\"test\\" test"@test.com which is the observed output once you remove the Python quoting. So, parseaddr is working as designed. The question is, what is the design decision behind resolving the quoted pairs but leaving the quotes? |
|
|
msg117860 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-10-02 04:06 |
After working my way through the code I no longer think that parseaddr is working as designed. I think that this is a bug, and that there is a missing call to quote in getaddrspec. Attached is a revised set of unit tests and a fix. The full python test suite passes with this fix in place, but note that initially I made a mistake in the patch and running test_email passed...that is, before the attached tests there were no tests of parseaddr in the email test suite. I don't know if this patch is safe for backport, but I'm inclined that way. It is hard to see how 3rd party code could be compensating for this bug, since it looses quoting information that doesn't appear to be algorithmically recoverable. |
|
|
msg117884 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-10-02 16:28 |
Fix committed to py3k in r85179, 3.1 in r85170, and 2.7 in r85181. I modified the unit tests, deleting the ones that were redundant because they were just two different python spellings of the same input string, and adding a comment about the third test case's quoting pattern. |
|
|