msg204553 - (view) |
Author: Petri Lehtinen (petri.lehtinen) *  |
Date: 2013-11-27 06:10 |
From Illirgway, https://github.com/akheron/cpython/pull/2: Fix utf-8 bug, when symbol byte chain splits on 2 tcp-packets: error: uncaptured python exception, closing channel (:'utf-8' codec can't decode byte 0xd0 in position 1474: unexpected end of data [/usr/lib/python3.3/asyncore.py|read |
83] [/usr/lib/python3.3/asyncore.py |
handle_read_event |
msg204577 - (view) |
Author: (Illirgway) |
Date: 2013-11-27 12:34 |
"base64", "quoted-printable" and "7bit" messages only use ASCII range of characters so raw_message.decode('ascii') == raw_message.decode('latin1') == etc. == raw_message.decode('utf-8') due to internal representation of utf-8 characters P.S. Python 3.4.0 beta 1 Lib/smtpd has the same bug and needs the same patch to fix this issue |
|
|
msg204578 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2013-11-27 13:14 |
It occurs to me to wonder why smtpd is receiving 8bit data at all. It isn't advertising the 8BITMIME capability, so the client should be sending only 7bit data. See also issue 19662. I'm not sure that making utf-8 decoding work better is a good idea, but I suppose that since it is a data-dependent exception in a situation that would work with different data, we should go ahead and fix it. |
|
|
msg204580 - (view) |
Author: Petri Lehtinen (petri.lehtinen) *  |
Date: 2013-11-27 13:45 |
Is there a reason why this patch changes the decoding error handler to 'ignore' (from the default 'strict')? |
|
|
msg204585 - (view) |
Author: (Illirgway) |
Date: 2013-11-27 14:17 |
because "strict" throws an exception (for example, on raw win1251) which brings down production smtp daemon and "replace" embeds "ugly" characters into received (and passed) email messages |
|
|
msg221208 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2014-06-21 23:11 |
@Petri/Illirgway could one of you produce a new patch including the code change and a test for this? |
|
|
msg222622 - (view) |
Author: Milan Oberkirch (zvyn) * |
Date: 2014-07-09 15:33 |
I do not think that the proposed patch solves the bug because it silently changes binary input. With the patch for issue 19662 a proper solution to avoid this bug has been applied. The only thing left is to prevent the server to raise the exception when in legacy mode. Instead of deleting single bytes from the input (which is what .decode('utf-8', 'ignore') does) I would reply with an error to the client. The attached patch implements and tests this behaviour. |
|
|
msg222735 - (view) |
Author: Walter Dörwald (doerwalter) *  |
Date: 2014-07-11 10:31 |
I don't know anything about SMTP, but would it make sense to use an incremental decoder for decoding UTF-8? |
|
|
msg223023 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2014-07-14 13:09 |
As Milan said, the problem doesn't arise in 3.5 with decode_data=False, since there's no decoding. His patch doesn't actually fix the bug for the decode_data=True case, though, since the bug is a *valid* utf-8 sequence getting split across tcp buffers. To fix it, we would need to change the implementation of decode_data. Instead of conditionally decoding in collect_data, we'd need to postpone decoding to found_terminator. This would have the undesirable affect of changing what is in the received_lines attribute, which is why we didn't do it in the decode_data patch. Using an incremental decoder won't solve that problem, since it too would change what gets stored in received_lines. Since decode_data=True is really not a legitimate mode for smtpd (it is an historical accident/bug) and we are planning on removing it eventually, I think we should go ahead and apply Milan's patch as is, since it does improve the error reporting. The message would need to be adjusted though, since it can trigger on valid utf-8 data. It should say that smtpd should be run with decode_data=False in order to fix the decode problem. That would leave the bug as-is in 3.4, but a similar patch with an error message suggesting an upgrade to 3.5/decode_data=True could be applied. That feels a little weird, though :). |
|
|
msg223263 - (view) |
Author: Milan Oberkirch (zvyn) * |
Date: 2014-07-16 19:56 |
Note that in this (and my previous) patch the message is sent to the client (the idea was not to raise an exception). Maybe it would be better to raise an exception with the information you mentioned? |
|
|
msg223267 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2014-07-16 20:30 |
No, I think the error return is better. Crashing the server on bad input data (which, worse, could be perfectly valid if the server were standards compliant) is a bad thing, I think. That is, if someone has an application that actually works with decode_data=True, I think it is better if it doesn't crash just because of different input data or TCP packet boundaries, as long as the error is reported. I'm willing to be convinced otherwise, but that's how I'm leaning. |
|
|
msg223274 - (view) |
Author: Milan Oberkirch (zvyn) * |
Date: 2014-07-16 21:04 |
Agreed. It just feels a bit weird to send programming instructions for the server to the client (but it's probably the best we can do here). |
|
|
msg223278 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2014-07-16 21:17 |
Oh, that's a good point, I hadn't thought of that. Maybe I can come up with a better wording. |
|
|
msg309754 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2018-01-10 00:56 |
I'm closing this as won't fix since smtpd.py is deprecated and will likely not get any future development. Please see aiosmtpd as a much better third party replacement. |
|
|